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2 4 -11- 2003 

Differential Diagnosis of Colorectal Cancer and other Diseases of the Colon 

The present invention provides biomolecules and the use of these biomolecules for the differential 
diagnosis of colorectal cancer or a nornmalignant disease of the large intestine. In specific 
5 embodiments, the biomolecules are characterised by mass profiles, generated by contacting a test 
and/or biological sample with an anion exchange sur&ce under specific binding conditions and 
detecting said biomolecules using gas phase ion spectrometry. The biomolecules used according to the 
invention are preferably proteins or polypeptides. Furthermore, preferred test and/or biological 
samples are blood serum samples and are of human origin. 

10 

BACKGROUND TO TEDB INVENTION 

Colorectal' cancer is the fourth most common cancer in the world to date, and accounts for 
approximately 200,000 deaths per year in Europe and the US alone. Although colorectal cancer 
generally affects both men and women equally (currently at 9.4% and 10.1% of incident cancer, 
15 respectively), its distribution as a leading cause of death in men and women is disproportionate. 
Whoieas colorectal cancer is the fourth leading cancer-related cause of death in men (following lung, 
stomach and prostate canc^), in women it takes second place to breast cancer. Furthermore, colorectal 
cancer is more prevalent in developed countries exhibiting more westernised lifestyle practices. 

20 Familial and hereditary factors have been observed to play primary roles in the cause of colorectal 
cancers. In addition, a number of other factors have been shown to be associated with an increased risk 
of developing colorectal cancer namely the presence of adenomatous polyps, history/presence of 
inflammatory bowel disease, diets rich in animal &ts and significantly decreased consumption of raw 
or fiesh vegetables (especially leafy green vegetables, cruciferous vegetables, as well as allium 

25 vegetables such as garlic, onions, chives). 

Significant differences exist regarding the svuvival of patients affected by colorectal cancer according 
to the stages at which the disease is diagnosed Most patients exhibit symptoms such as rectal 
bleeding, pain, abdominal distension or weight loss only after the disease, is in its advanced stages, 
30 leaving litde therapeutic options available. Clearly, early detection of primary, metastatic, and 
recurrent disease can significantly impact the prognosis of individuals suffering from colorectal 
cancer. Diagnosis at an early stage, prior to lymph-node spread, can significantly improve the rate of 
survival as compared to a diagnosis established at a later stage of the disease, since the therapies used 
to treat colorectal cancer are stage-d^)endent 

35 

In date, fecal occult blood test (FOBT),. flexible sigmoidoscopy, double contrast barium enraia, and 
colonoscopy are the primary tools utilised to detect colorectal cancer at its early stages. Among these 
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only FOBT, which is based on the high probability that blood found within a patients' fecal (heme- 
positive) sample arises from tumours found within the large mtestine, is non-invasive, simple and 
relatively inexpensive. Unfortunately, this method of early detection has several drawbacks. 

5 Firstly, a positive FOBT result l^ds to further examination, mainly colonoscopy - an extremely 
discomforting, invasive diagnostic method which is expensive and carries a serious complication rate 
of one per 5,000 examinations. Colonoscopy, as a follow-up diagnostic method, might prove to be 
eflFective in confirming colorectal cancer within a patient provided that the FOBT results indeed' reflect 
the presence of the disease. Unfortunately this is more often not the case, since only 12% of the 

10 patients with a heme-positive fecal sample are diagnosed with cancer or large polyps at the time of 
colonoscopy. Furthermore, physicians frequently fail to properly instruct their patients on how fecal 
samples should be collected* Normally, patients are told to adhere to specific dietary guidelines and to 
avoid taking medication known to induce gastrointestinal bleeding. Should the patient not be 
instructed properly, nor adhere to the strict protocol, the chance of obtaining a false-positive FOBT 

IS result is greatly increased. The false positive-FOBT result will subsequently send the patient for a 
confirmatory diagnosis, which is neither necessary, inexp&Eisive, or pleasant. Secondly, a 
fidse-negative result holds even greater consequence since a patient possessing colorectal cancer, in 
this case, would not be diagnosed as having the disease and would be sent home without pr€>per 
therapy. 

20 

Currently, many groups are utilising proteomic technologies to comparatively analyse the difTerences 
in protein levels in colorectal cancers vs. normal large intestinal tissue in the hopes of developing 
diagnostic markers that could assist the practicing clinician in the management of colorectal canc^. 
Currently, the standard method of proteome analysis has been two dimensional (2D) gel 
25 electrophoresis, which has been an invaluable tool for the separation and identification of proteins. 
This method is also effective in identifying aberrandy expressed proteins in a variety of tissue 
sarhples. Unfortunately, the analysis of data generated by 2D-gel electrophoresis is labour-intensive 
and requires large quantities of material for protein analysis, thereby rendering it impractical for 
routine clinical use. 

30 

Through the introduction of SELDI (surface enhanced laser desorption ionization), a modification of 
MALDI-TOF (matrix-assisted laser desorption ionization/time of flight) which is a mass spectrometry 
technique that allows for the simultaneous analysis of multiple proteins in one sample, this tool has 
been achieved. Small amounts of proteins caa be directly bound to a biochip, carrying spots with 
35 different types of chromatographic material, including those with hydrophobic, hydrophilic, cation- 
exchanging and anion-exchanging characteristics. This approach has been proven to be very useful to 
identify proteins and protein patterns (profiles) in various biological fluids, including seruni, urine or 
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pancreatic juice. 

To date, specific biomarkers for the detection of breast and prostate cancers (patents WO0223200» 
WO03058198 and WO0125791 finom Cipbergen, respectively) have been identified using the above 
S mentioned SELDI technology. Unfortunately, due to the nature of sample testing, the biomarkers 
identified can only be used to diagnose a patimt as Imving a specific cancer (either breast or prostate) 
versus not having the disease at all. For example, whereas the test samples analysed in WO030S8198 
(Ciphergen) and WO0223200 (Ciphergen) were taken firom patients with late-stage breast cancer 
(stages m and IV), the control samples were taken firom patients with undetectable breast cancer. The 
10 biomarkers identified are neither grade-specific nor can they detect the disease at its earliest stages 
• (stage I and II)» and thoieby would not allow for efifective patient-specific treatment of the disease. 
Moreover, biomarl^rs that can differentiate between the presmce of a colorectal cancer, a non- 
malignant disease of the large intestine, or an acute and chronic inflammation of the epithelium have 
not yet been identified. 

15 

Accordingly, there is a critical need to develop a simple, non-invasive, reliable and inexpensive 
method for the effective detection of colorectal cancer at its eariy stages. Preferably, such a diagnostic 
method should be able to detect early-stage colorectal cancer, as well as distinguish between the later 
stages or grades of the disease. With such valuable infomiation, medical practitioners would be able to 
20 tailor patieut therapies for optimum treatment of the disease. 

The present iavention addresses this difficulty with the development of a non-invasive diagnostic tool 
for the differential diagnosis of colorectal cancer and non-malignant diseases of the large intestine. 

SUMMARY OF THE INVENTION 

The present invention relates to methods for the differential diagnosis of colorectal cancer or non- 
malignant disease of the large intestine by detecting one or more differentially e?q)ressed biomolecules 
within a test sample of a given subject, comparing results with samples fiom heahhy subjects, subjects 
having a precancerous lesion of the large intestine, subjects having a colorectal cancer, subjects having 
a metastasised colorectal cancer, or subjects having a non-malignant disease of the large intestine, 
wherem the comparison allows for the differential diagnosis of a subject as healthy, having a 
precancerous lesion of the large intestine, having a colorectal cancer, having a metastasised colorectal 
cancer or a non-malignant disease of tiie large intestine. 

The present invention provides a method for titie differential diagnosis of a colorectal cancer and/or a 
non-malignant disease of tiie large intestine, in vitro, comprising obtaining a test sample from a 
subject, contacting test sample with a biologically active sur&ce under specific binding conditions. 
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allowing for biomolecules present within the test sample to bind to the biologically active surface, 
..detecting one or more bound biomolecules using mass spectrometry thereby generating a mass profile 
of said test sample, transforming data into a computer-readable fonn, and comparing said mass profile 
against a database containing mass profiles specific for healthy subjects, subjects having a 
S precancerous lesion of the large intestine, subjects having colorectal cancer, subjects having 
metastasised colorectal cancers, or subjects having a non-malignant disease of the large intestine, 
wherein the comparison allows for the difiTerential diagnosis of a subject as healthy, having a 
precancerous lesion of the large intestine, having a colorectal cancer, having a metastasised colorectal 
cancer or a non-malignant disease of the large intestine. 

10 

In one embodiment the invention provides a database comprising of mass profiles of biological 
samples from healthy subjects, subjects having a precancerous lesion of the large intestine, subjects 
having a colorectal canc^, subjects having a metastasised colorectal cancer, or subjects having a non^ 
malignant disease of the large intestine. 

15 

Within the same onbodiment the database is generated by obtaining biological samples fit>m healthy 
subjects, subjects having a precancerous lesion of the large intestine, subjects having a colorectal 
cancer, subjects having a metastasised colorectal cancer, and subjects having a non-malignant disease 
of the large intestine, contacting said biological samples with a biolo^cally active sur&ce under 

20 specific binding conditions, allowing the biomolecules within the biological sample to bind to said 
biologically active surface, detecting one or more bound biomolecules using mass spectrometry 
thereby generating a mass profile of said biological samples, transforming data into a 
computer-readable form, and applying a mathematical algorithm to classify the mass profiles as 
specific for healthy subjects, subjects having a precancerous lesion of the large intestine, subjects 

25 having colorectal cancer, subjects having metastasised colorectal cancer, and subjects having a non- 
malignant disease of the large intestine. 

In specific embodiments, the present invention provides biomolecules having a molecular mass 
selected from the group consisting of 2020 Da ± 10 Da, 2049 Da ± 10 Da, 2270 Da ± 1 1 Da, 2508 Da 

30 ± 13 Da, 2732 Da ± 14 Da, 3026 Da ± 15 Da, 3227 Da ± 17 Da, 3326 Da ± 17 Da, 3456 Da ± 17 Da, 
3946 Da ± 20 Da, 4103 Da ± 21 Da, 4242 Da ± 21 Da, 4295 Da ± 21 Da, 4359 Da ± 22 Da, 4476 Da ± 
22 Da, 4546 Da ± 23 Da, 4607 Da ± 23 Da, 4719 Da ± 24 Da, 4830 Da ± 24 Da, 4865 Da ± 24 Da, 
4963 Da ± 25 Da, 5 112 Da ± 26 Da, 5226 Da ± 26 Da, 5493 Da db 27 Da, 5648 Da ± 28 Da, 5772 Da ± 
29 Da, 5854 Da ± 29 Da, 6446 Da ± 32 Da. 6644 Da ± 33 Da, 6852 Da ± 34 Da, 6897 Da ± 34 Da, 

35 6999 D^ 35 Da, 7575 Da ± 38 Da, 7657 Da db 38 Da, 8076 Da ± 40 Da, 8215 Da ± 41 Da, 8474 Da ± 
42,pa, 8574 I?a ± 43 Da, 8702 Da i 44 Da, 8780 Da ± 44 Da, 8922 Da ± 45 Da, 9078 Da ± 45 Da, 
* 9143 Da ± 46 Da, 920 1 Da ± 46 Da, 9359 Da ± 47 Da, 9425 Da ± 47 Da, 958 1 Da =fc 48 Da, 9641 Da ± 
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48 Da, 9718 Da ± 49 Da, 9930 Da ± 50 Da, 10215 Da ± 51 Da, 10369 Da ± 52 Da, 10440 Da ± 52 Da, 
10594 Da ± 53 Da, 11216 Da ± 56 Da, 11464 Da ± 57 Da, 11547 Da ± 58 Da, 11693 Da ± 58 Da, 
1 1905 Da ± 60 Da, 12470 Da ± 62 Da, 12619 Da ± 63 Da, 12828 Da ± 64 Da, 13290 Da ± 66 Da, 
13632 Da ± 68 Da, 13784 Da ± 69 Da, 13983 Da ± 70 Da, 14798 Da ± 74 Da, 15005 Da ± 75 Da, 

5 15140 Da ± 76 Da, 15350 Da ± 77 Da, 15879 Da ± 79 Da, 15957 Da i 80 Da, 16104 Da ± 81 Da, 
16164 Da ± 81 Da, 16953 Da ± 85 Da, 17263 Da ± 86 Da, 17397 Da ± 87 Da, 17617 Da ± 88 Da, 
17766 Da ± 89 Da, 17890 Da i 89 Da, 181 15 Da ± 91 Da, 18390 Da ± 92 Da, 22338 Da J: 1 12 Da, 
22466 Da ± 1 12 Da, 22676 Da ± 1 13 Da, 2295 1 Da ± 1 1 5 Da, 24079 Da ± 120 Da, 28055 Da =b 140 
Da, and 28259 Da db 141 Da. The biomolecules having said molecular masses aie detected by 

10 contactmg a test and/or biological sample with a biologically active surface comprising an adsqibent 
under specific binding conditions and further analysed by gas phase ion spectrometry. Preferably the 
adsorbent used is comprised of positively charged quatemaiy ammonium groups (anion exchange 
sur&ce). 

IS In specific embodiments, the invention provides specific binding conditions for the detection of 
biomolecules within a sample. In preferred embodiments, a sample is diluted 1:5 in a denaturafion 
buffer consisting of 7 M urea. 2 M thiourea, 4% CHAPS. 1% DTT, and 2% Ampholine, and then 
diluted again 1:10 in binding buffo: consistmg of 0.1 M Tris-HCl, 0.02% Triton X-100 at a pH 8.5 at 0 
to 4^C. The treated sample is then contacted with a biologically active sur&ce comprising of positively 

20 charged (cationic) quaternary anmionium groxq>s (anion exchanging), incubated for 120 minutes at 20 
to 24^C, and the bound biomolecules are detected using gas phase ion ^ectrometry. 

In an alternative embodiment, the invention provides a method for the differential diagnosis of a 
colorectal cancer and/or a non-malignant disease of the large intestine comprising detecting of one or 

25 mom differentially expressed biomolecules withm a sample. This method comprises obtaining a test 
sample ftom a subject, contacting said sample with a binding molecule specific for a differentially 
expressed polypeptide, detecting an interaction between the binding molecule and its specific 
polypeptide, wherein the detection of an interaction indicates the presence or absence of said 
polypeptide, thereby allowing for the differential diagnosis of a subject as healthy, having a 

30 precancerous lesion of the large intestine, having a colorectal cancer, having a metastasised colorectal 
cancer andA>r a non-malignant disease of the large intestine. Preferably, binding molecules are 
antibodies specific for said polypeptides. 

The biomolecules related to the invention, having a molecular mass selected from the group consisting 
35 of 2020 Da ± 10 Da, 2049 Da ± 10 Da, 2270 Da ± U Da. 2568 Da ± 13 Da, 2732 Da ± 14 Da. 3026 
Da ± 15 Da, 3227 Da ± 17 Da. 3326 Da ± 17 Da, 3456 Da ± 17 Da, 3946 Da ± 20 Da, 4103 Da ± 21 
Da, 4242 Da± 21 Da, 4295 Da± 21 Da, 4359 Da ± 22 Da, 4476 Da ±22 Da, 4546 Da± 23 Da, 4607 
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Da ± 23 Da, 4719 Da ± 24 Da, 4830 Da ± 24 Da, 4865 Da ± 24 Da, 4963 Da ± 25 Da, 5 1 12 Da ± 26 
Da, 5226 Da ± 26 Da, 5493 Da ± 27 Da, 5648 Da ± 28 Da, 5772 Da ± 29 Da, 5854 Da db 29 Da, 6446 
Da ± 32 Da, 6644 Da ± 33 Da, 6852 Da ± 34 Da, 6897 Da ± 34 Da, 6999 Da ± 35 Da, 7575 Da ± 38 
Da, 7657 Da ± 38 Da, 8076 Da ± 40 Da, 8215 Da ± 41 Da, 8474 Da ± 42 Da, 8574 Da ± 43 Da, 8702 
5 Da ± 44 Da, 8780 Da ± 44 Da, 8922 Da db 45 Da, 9078 Da ± 45 Da, 9 143 Da ± 46 Da, 9201 Da ± 46 
Da, 9359 Da db 47 Da, 9425 Da ± 47 Da, 9581 Da ± 48 Da, 9641 Da ± 48 Da, 9718 Da ± 49 Da, 9930 
Da ± 50 Da, 10215 Da ± 51 Da, 10369 Da ± 52 Da, 10440 Da ± 52 Da, 10594 Da ± 53 Da, 1 1216 Da 
± 56 Da, 1 1464 Da db 57 Da, 1 1547 Da db 58 Da, 1 1693 Da ± 58 Da, 1 1905 Da ± 60 Da, 12470 Da ± 62 
Da, 12619 Da ± 63 Da, 12828 Da ± 64 Da, 13290 Da ± 66 Da, 13632 Da ± 68 Da, 13784 Da ± 69 Da, 

10 13983 Da ± 70 Da, 14798 Da ± 74 Da, 15005 Da ± 75 Da, 15140 Da ± 76 Da, 15350 Da ± 77 Da, 
15879 Da db 79 Da, 15957 Da ± 80 Da, 16104 Da d= 81 Da, 16164 Da =h 81 Da, 16953 Da ± 85 Da, 
17263 Da ± 86 Da, 17397 Da ± 87 Da, 17617 Da ± 88 Da, 17766 Da ± 89 Da, 17890 Da ± 89 Da, 
181 15 Da ± 91 Da, 18390 Da ± 92 Da, 22338 Da =fc 1 12 Da, 22466 Da ± 1 12 Da, 22676 Da ± 1 13 Da, 
22951 Da ± 115 Da, 24079 Da =b 120 Da, 28055 Da ± 140 Da, or 28259 Da db 141 Da , and may 

15 include, but are not limited to, molecules comprising nucleotides, amino acids, sugars, &tty acids, 
steroids, nucleic acids, polynucleotides (DNA or RNA), polypeptides, proteins, antibodies, 
carbohydrates, lipids, and combinations thereof (e.g., glycoproteins, ribonudeoproteins, lipoprotems). 
Preferably said biomolecules are proteins,' polypeptides^ or fragments thereof. 

20 In yet another embodiment, the invention provides a metiiod for the identification of biomolecules 
within a sample, provided that the bioniolecules are proteins, polypeptides or fragments thereof, 
comprising: chromatography and firactionation, analysis of fractions for the presence of said 
differentially expressed proteins and/or firagments thereof, using a biologically active surface, further 
analysis using mass spectrometry to obtain amino acid sequences encoding said proteins and/or 

25 fragments thereof, and searching amino acid sequence databases of known proteins to identify said 
differentially expressed proteins by amino acid sequence comparison. Preferably the method of 
chromatography is high performance liquid chromatography (HPLC) or fast protein liquid 
chromatography (FPLC). Furthemiore, the mass spectrometry used is selected from the group of 
matrix-assisted laser desorption ionization/tune of flight (MAJLDI-TOF), surface enhanced laser 

30 desorption ionisation/time of fligjat (SELDI-TOF), liquid chromatography, MS-MS, or ESI-MS. 

Furthermore, the invention provides kits for the differential diagnosis of a colorectal cancer and/or a 
non-malignant disease of the colon. 

35 The test or biological samples used according to the iavention may be of blood, blood serum, plasma, 
nipple aspirate, mine, semen, seminal fluid, seminal plasma, prostatic fluid, excreta, tears, saliva, 
sweat, biopsy, ascites, cerebrospinal fluid, ^milk, lymph, or tissue extract origin. Preferably, ithe test 
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and/or biological samples are blood serum samples, and are isolated firom subjects of mammalian 
origin, pref^bly of human origin. 

A colorectal cancer of the invention is a cancer of the large intestine, and may include cancers of the 
5 colon, rectum etc. Furtheimore, a colorectal cancer, as intended by the* invention, may be of various 
stages and/or grades. 

DESCRIPTION OF nOURES 

Figure 1. Comparison of protein mass spectra processed on the anion exchange surface of a SAX2 
10 ProteinCMp array comprised of cationic quaternary ammonium groups. Protein mass spectra obtained 
from sera of endoscopy control patients (CI and C2). suffering from non-malignant diseases of flie 
large intestine (e.g., acute or dux>nic inflammation, adenoma) and of patients with colon cancer (TI 
and T2) are shown. Scattered boxes indicate differentially expressed proteins with high diagnostic 
significance. A representative differentially expressed protein (m/2= 6645 Da) is highli^ted 
15 possessing high importance within the generated classifiers (ensemble of decision trees) according to 
overaU unprovement, see Tables 1-4. The X-axis shows the mass/charge (m/z) ratio, which is ; 
equivalent to the apparent molecular mass of the correspondmg biomolecule. The Y-axis shows the 
normalized relative signal intmsity of the peak in the exanuned serum samples. 

Figure 2A - Scatter plots of clusters (peaks, variables), belonging to differentially expressed 
proteins included m the four classifiers. The X-axis shows ttie mass/charge (m/z) ratio, which is 
equivalent to the parent molecular mass of the corresponding biomolecule. The Y-axis shows the 
logarithmic normalized relative signal intensity of the peaks m the exammed serum samples. First, 
intensities were shifted to yield entirely positive values. Then, for each mass, intensities were 
normalized by dividing the intensity values by tiie average intensity of that mass. Finally, the natural 
logarithm was taken, a T (Tumour): Colon cancer patients* serum samples, o N (Noraial): Endoscopy 
control patients* serum samples. 

Figure 3A - F. Additiormlly scaled scatter plots of clusters (peaks, variables), belonging to 
differentially expressed proteins mcluded in the four classifiers. The X-axis shows tiie mass/charge 
(m/z) ratio, which is equivalent to the apparent molecular mass of the corresponding biomolecule. As 
m Figure 2, the Y-axis shows the logarithmic normalized relative signal intensity of the peaks in the 
examined serum*^ samples. However, intensities were additionally (shifted and) scaled so that the 
mtensities of each mass coyer tiie entire range of the Y-axis. Thereby, the minimum and maximum 
intensities of all masses are aUgned on the lower and upper edge of die plot, respectively. This allows 
to better visualize tiie extend of class overlap, a T (Tumour): Colon cancer patients* serum samples, 
o N (Normal): Endoscopy control patients* serum samples. 
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Figure 4. Complexity of proof-of-principle classifier. The histogram visualizes the distribution of the 
number of decision tree variables (peaks, clusters) for the obtained proof-of-piinciple classifier for 
gastric cancer. 6 variables per decision tree are typical. 

5 

Figure 5. Variable importance of the proof-of-principle classifier. The histograms visualize how often 
a variable (mass) is employed in the proof-of-principle classifier. The fi-equency of variable selection 
is presented in histogram form for each hierarchical level (a-j) and for all hierarchical levels taken 
together (k). 

10 

Figure 6. Complexity of 1"^ final classifier. The histogram visualizes the distribution of the number of 
decision tree variables Cpeaks, clusters) for the obtamed 1^^ final classifier in the range of 1 to 10 
decision tree variables. 9 variables per decision tree are typical. 

15 Figure 7. Variable importance of 1** final classifier. The histogram visualizes how often a variable 
(mass) is employed in the final classifier. The frequency of variable selection is presented in histogram 
fonn for each of the first 10 hierarchical levels (a-j) and for the first ten hierarchical levels taken 
together (k). 

20 Figure 8. Complexity of 2"^ final classifier. The histogram visualizes the distribution of the number of 
decision tree variables (peaks, clusters) for the obtained 2°^ final classifier in the range of 1 to 10 
decision tree variables. As many as 10 variables per decision tree axe typical. 

Figure 9. Variable importance of 2"* final classifier. The histogram visualizes how often a variable 
25 (mass) is employed m the 2™* final classifier. The fixsquency of variable selection is presented in 
histogram form for each of the first 10 hierarchical levels (a-j) and for the first ten hierarchical levels 
taken together (k). 

Figure 10. Complexity of 3"* final classifier. The histogram visualizes the distribution of the number 
30 of decision tree variables (peaks, clusters) for the obtained 3rd final cliassifier in the range of 1 to 10 
decision tree variables. As many as 10 variables per decision tree are typical. 

Figure 11. Variable importance of 3"* final classifier. The histogram visualizes how often a variable 
(mass) is employed in the 3"* final classifier. The frequmcy of variable selection is presented in 
35 histogram form for each of the first 10 hierarchical levels (a-j) and for the first ten hierarchical levels 
taken together (k). 
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DESCRIPTION OF THE INVENTION 

It is to be understood that the present invention is not limited to the particular materials and niethods 
described or equipment, as these may vary. It is also to be understood that the terminology used herein 
is for the purpose of describing particular embodiments only, and is not intended to limit the scope of 
the present invention, which will be limited only by the appended claims. 

It should be noted that as used herein and in the appended claims, the singular forms ^a," ''an," and 
*the*' include plural reference unless the context clearly dictates otherwise. Thus, for example, a 
reference to '^an antibody" is a reference to one or more antibodies and derivatives thereof known to 
those skilled in the art, and so forth. 

Unless defined otherwise, all technical and sciCTtific terms used herein have the same meanings as 
commonly understood by one of ordinary skill in the art. Although any materials and methods, or 
equipment comparable to those specifically described herein can be used to pmctice or test the present 
invention, the prefnred equipment, matmals and methods axe described below. All publications 
mentioned herein are cited for the purpose of describing and disclosing protocols, reagents, and 
current state of the art technologies that migiht be used in connection with the invention. Nothing 
h^ein is to be construed as an admission that the invention is not entitled to precede such disclosure, 
by virtue of prior invention. 

Definitions 

The term '^iomolecule" refers to a molecule produced by a cell or living organism. Such molecules 
include, but are not limited to, molecules comprising nucleotides, amino acids, sugars, fatty acids, 
steroids, nucleic acids, polynucleotides, polypeptides, proteins, carbohydrates, lipids, and 
combinations thereof (e.g., glycoproteins, ribonucleoproteins, lipoproteins). Furthermore, the terms 
"nucleotide" or polynucleotide" refer to a nucleotide, oligonucleotide, polynucleotide, or any fragment 
thereof These phrases also refer to DNA or RNA of genomic or synthetic origin which may be smgle- 
stranded or double-stranded and may represent the sense, or the antisense strand, to peptide 
. polynucleotide sequences (i.e. peptide nucleic acids; PNAs), or to any DNA-like or RNA-like 
material. 

The term "fragment refers tQ a portion of a polypeptide (parent) sequence that comprises at least 10 
consecutive amino acid residues and retains a biological activity and/or some functional characteristics 
of the parent polypeptide e.g. antigenicity or structural domain characteristics. 

The terms '^biological sample" and *test sample" refer to all biological fluids and excretions isolated 
from any given subject. In the context of the invention such samples include, but are not limited to. 
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blood, blood serum, plasma, nipple aspirate, urine, semen, seminal fluid, seminal plasma, prostatic 
fluid, excreta, tears, saliva, sweat, biopsy, ascites, cerebrospinal fluid, milk, lymph, or tissue extract 
samples. 

5 The term "specific binding'* refers to the binding reaction between a biomolecule and a specific 
*T)inding molecule". Related to the invention are binding molecules that include, but are not limited to, 
proteins, peptides, nucleotides, nucleic acids, honnones, amino acids, sugars, fatty acids, steroids, 
polynucleotides, carbohydrates, lipids, or a combination thereof (e.g. glycoproteins, 
ribonucleoproteins, lipoproteins). Furthermore, a binding reaction is considered to be specific when 
10 the interaction between said molecules is substantial In the context of the invention, a binding 
reaction is considered substantial when the reaction that takes place between said molecules is at least 
two times the backgroimd. Moreover, the term "specific binding conditions" refers to reaction 
conditions that pennit the binding of said molecules such as pH, salt, detergCTt and other conditions 
known to those skilled in the art 

15 

The term "interaction" relates to the direct or indirect binding or alteration of biological activity of a 
biomolecule. 

The terai "differential diagnosis" refers to a diagnostic decision between a healthy and different 
20 disease states, including various stages of a specific disease. A subject is diagnosed as healthy or to be 
suffering firom a specific disease, or a specific stage of a disease based on a set of hypotheses that 
allow for the distinction betwem healthy and one or more stages of the disease. The choice between 
healthy and one or more stages of disease depends on a significant difference between each 
hypothesis. Under the same principle, a "differential diagnosis" may also refer to a diagnostic decision 
25 between one disease type as compared to anoth^ (e,g. colon cancer vs. diverticulosis). 

The term "colorectal cancer** refers to a cancer state associated with the large intestine of any given 
subject, wherein the cancer state is defined according to its stage and/or grade. The various stages of a 
cancer may be identified using staging systems known to those skilled in the art [e.g. Union 
30 Internationale Centre Cancer (UICC) system or American Joint Committee on Cancer (AJC)]. In the 
context of the invention colorectal cancers include but are not limited to colon and rectal cancers. 

The term "non-malignant disease of the large intestine" refers' to alterations in the physiological, 
functional and/or anatomical state of the large intestine, wherein the alterations deviate fi-om norinal. 
35 In addition, this term encompasses alterations in the physiological, fimctiohal and/oi: anatomical state 
of the large intestine that cannot be staged or g^raded according to cancer staging systems known to 
those skilled in the art [e.g. Union Internationale Centre Cancer (UICC) system or American Joint 
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Committee on Cancer (AJC)]. Such non-malignant disease include but are not limited to the acute and 
chronic inflammation of the large intestinal epithelium, diverticular disease including diveiticulosis 
and diverticuUtis, colitis, ulcerative colitis, pancolitis, Crohn's disease (ileitis), proctitis, intestinal 
polyps including hyperplastic polyps, hamartomatous polyps (i.e. Juvenile polyps, Peutz-Jeghers 
polyps), inflammatory polyps, and lymphoid polyps, adenomatous polyps. 

The term "healthy individual" refers to a subject possessing good health. Such a subject demonstrates 
an absence of any disease within the large intestine, preferably a colorectal cancer or a non-malignant 
disease of the large intestine. 

The term "precancerous lesion of the large intestine" refers to a biological change within a cell and/or 
tissue of the large intestine such that said cell and/or tissue becomes susceptible to the development of 
a cancer. More specifically, a precancerous lesion of tiie large intestine is a preliminary stage of a 
colorectal cancer (i.e. dysplasia). Causes of a precancerous lesion of the larger intestine may include, 
but are not limited to, genetic predisposition and exposure to cancer-causing agents (carcinogens); 
such cancer causing agents include agents that cause genetic damage and induce ne€>plastic 
transformation, of a cell. Furthermore, the phrase 'hieoplastic transformation of a cell" refers an 
alteration in normal cell physiology and includes, but is not limited to, self-sufficiency in growth 
signals, insensitivity to growth-inhibitory (anti-growth) signals, evasion of programmed cell death 
(apoptosis), limitless rq[>Iicative potential, sustained angiogenesis, and tissue invasion and metastasis. 

The term **dysplasia" refers to morphological alterations within a tissue, which are characterised by a 
loss in the uniformity of individual cells, as well as a loss in their architectural orientation. 
Furtfamnore, dysplastic cells also ^dubit a variation in size and shape. 

The phrase "diflFerentially present*' refers to differences in the quantity of a biomolecule (of a 

particular apparent molecular mass) present in a sample from a subject as compared to a comparable 

sample. For example, a biomolecule is present at an elevated level, a decreased level or absent in 

samples of subjects having colorectal cancer compared to samples of subjects who do not have a 

cancer of the large intestine. Therefore in the context of the invention, the term "differentially present 

biomolecule" refers to the quantity biomolecule (of a particular- apparent molecular mass) present 

within a sample taken from a subject having a disease or cancer of the large intestine as compared to a 

.ft 

comparable sample tskca from a healthy subject. Within the context <)f the invention, a biomolecule is 
differentially present betwe^ two samples if the quantity of said biomolecule in one. sample is 
statistically significantiy different from the quantity of said biomolecule in another sample. 

The term "diagnostic assay" can be used interchangeably with "diagnostic method" and refers to the 
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detection of the presence or nature of a pathologic condition. Diagnostic assays differ in their 
sensitivity and specificity. Within the context of the invention the sensitivity of a diagnostic assay is 
defined as the percentage of diseased subjects who test positive for a colorectal cancer or a non- 
malignant disease of the large intestine and are considered "true positives'\ Subjects having a 
5 colorectal cancer or a non-malignant disease of the large intestine but not detected by the diagnostic 
assay are considered *'&lse negatives'*. Subjects who are not diseased and who test negative in the 
diagnostic assay are considered *true negatives". Furthermore, the term specificity of a diagnostic 
assay, as used herein, is defined as 1 minus the false positive rate, where the "false positive rate" is 
defined as the proportion of those subjects devoid of a colorectal cancer or a non-malignant disease of 
10 the large intestine but who test positive in said assay. 

The term "adsorbent** refers to any material that is capable of accumulating (binding) a biomolecule. 
The adsorbent typically coats a biologically active surface and is composed of a single material or a 
plurality of different materials that are capable of binding a biomolecule. Such materials include, but 
15 are not limited to, anion exchange materials, cation exchange materials, metal chelators, 
polynucleotides, oligonucleotides, peptides, antibodies, metal chelators etc. 

The term "biologically active sur&ce" refers to any two- or three-dimensional extension of a material 
ibat biomolecules can bind to, or interact with, due to the specific biochemical properties of this 
20 material and those of the biomolecules. Such biochemical properties include, but are not limited to, 
ionic character (charge), hydrophobicity, or hydrophilicity. 

The term ^"binding molecide" refers to a molecide that displays an affinity for another molecule. With 
in the context of the invention such molecules may include, but are not liinited to nucleotides^ amino 
25 acids, sugars, fiitty acids, steroids, nucleic acids, polypeptides, carbohydrates, lipids, and combinations 
thereof (e.g. glycoproteins, ribonucleoproteins, lipoproteins). Preferably, such binding molecules are 
antibodies. 

The term "solution" refers to a homogeneous mixture of two or more substances. Solutions may 
30 include, but are not limited to buffers, substrate solutions, elution solutions, wash solutions, detection 
solutions, standardisation solutions, chemical solutions, solvents, etc. Furthermore, other solutions 
known to those skilled in the art £ne also included herein. 

The term *'mass profile" refers, to a mass s{Sectrum as a characteristic property of a given sample or a 
35 group of samples, especially when compared to the mass profile of a second sample or group of 
samples in any way different fi-om the first sample or group of sample. In the context of the invention, 
the txiass profile is obtained by treating the biological sample as follows. The sample is diluted it 1:5 in 
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a denaturation buffer consisting of 7 M urea, 2 M thiourea, 4% CHAPS, 1% DTT, and 2% ampholine 
and subsequently diluted 1:10 in binding buffer consisting of 0.1 M Tris-HQ, 0.02% Triton X-100 at 
pH 8.5. Thus pre-treated sample is applied to a biologically active ,surface comprising positively 
charged quaternary ammonium groups (anion exchange surface) and incubated for 120 minutes. The 
biomolecules bound to the surfece are analysed by gas phase ion spectrometry as described in another 
section. All but the dilution steps are performed at 20 to 24®C. Dilution steps are performed at 0 to 

The phrase "apparent molecular mass" refers to the molecular mass value in Dalton (Da) of a 
biomolecule as it may appear in a given method of investigation, e.g. size exclusion chromatography, 
gel electrophoresis, or mass spectrometry. 

The term "chromatography" refers to any method of separating biomolecules within a given sample 
such that the original native state of a given biomolecule is retained. Separation of a biomolecule finom 
other biomolecules within a given sample for the purpose of enrichment, purification and/or analysis, 
may be achieved by methods including, birt not limited to, size exclusion chromatogrE^hy, ion 
exchange chromatography, hydrophobic and hydiophilic interaction chromatography, metal afiSnity 
chromatography, wherein "metal" refers to metal ions (e.g. nickel, copper, gallaim, or zinc) of all 
chemically possible valences, or ligand afiSnity chromatography wherein "ligand" refers to binding 
molecules, preferably proteins, antibodies, or DNA. Generally, chromatography uses biologically 
active surfaces as adsorbents to selectively accumulate certain biomolecules. 

The term "mass spectrometry" refers to a method cornprising employing an ionization source to 
generate gas phase ions fi-om a biological entity of a sample presented on a biologically active surface 
and detecting the gas phase ions with a mass spectrometer. 

The phrase "laser desorption mass spectrometry" refers to a method comprising the use of a laser as an 
ionization source to generate gas phase ions from a biomolecule presented on a biologically active 
surSace and detecting the gas phase ions with a mass spectrometer. 

The term '*mass spectrometer" refers to a ga^ phase ion spectrometer that includes an inlet system, an 
ionisation source, an ion optic assembly, a mass analyser, and a detector. 

Within the context of the invention, the terms "detect", "detection" or "detecting** refer to the 

• . • ^' ' . ■ • . - 

identification of the presence, absence, or quantity of a biomolecule. 

The term "energy absorbing molecule" or "EAM" refers to a molecule that absorbs energy firom an 



13 



energy source in a mass spectrometer thereby enabling desorption of a biomolecule from a 
biologically active surfacel Cinnamic acid derivatives, sinapinic acid and dihydroxybenzoic acid are 
frequently used as energy-absorbing molecules in laser desorption of biomolecules. See U.S. Pat. No. 
3,719,060 (Hutchens & Yip) for a further description of energy absorbing molecules. 

5 

The term "training set" refers to a subset of the respective entire available data set. This subset is 
typically randomly selected, and is solely used for the purpose of classifier construction. 

The term '^est sef refers to a subset of the entire available data set consisting of those entries not 
1 0 included in the training set. Test data is applied to evaluate classifier performance. 

The term "decision tree" refers to a flow-chart-like tree structure employed for classification. Decision 
trees consist of repeated splits of a data set into subsets. Each split consists of a simple rule applied to 
one variable, e.g., "if value of Variable V larger than 'threshold V then go left else go right". 
IS Accordingly, the given feature space is partitioned into a set of rectangles with each rectangle assigned 
to one class. 

The teems "ensemble**, "tree ensemble" or "ensemble classifier" can be used interchangeably and refer 
to a classifier that consists of many simpler elementary classifiers, e.g., an ensemble of decision trees 
20 is a classifier consisting of decision trees. The result of the ensemble classifier is obtained by 
combining all the results of its constituent classifiers, e«g., by majority voting that weights all 
constituent classifiers equally. Majority voting is especially reasonable in the case of bagging, where 
constituent classifiers are th^ naturally weigihted by the frequency with which they are gmerated. 

25 The tenn ^'competitor'' refers to a variable (in our case: mass) that can be used as an alternative 
spUtting rule in a decision tree. In each step of decision tree construction, only the variable yielding 
best data splitting is selected. Competitors are non-selected variables with similar but lower 
performance than the selected variable. They point into the direction of alternative decision trees. 

30 The term *'suxiogate" refers to a splitting rule that closely mimics the action of the primary split. A 
surrogate is a variable that can substitute a selected decision tree variable, e.g. in the case of missing 
values. Not only must a good surrogate split the parent node into descendant nodes similar in size and 
composition to the primary descendant nodes. In addition, the surrogate must also match the primazy 
split on the specific cases that go to the left child and right child nodes. 

* .'35 ' 

The temis '"peakf ' and "signal'^ may be used interchangeably and refer to any signal which is genmted 
by a biomolecule when under investigation using a specific method, for example chromatography, 
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mass spectrometry, or any type of spectroscopy like UltravioletA^isible Light (UWVis) spectroscopy. 
Fourier Transformed Infrared (FTIR) spectroscopy. Electron Paramagnetic Resonance (EPR) 
spectroscopy, or Nuclear Mass Resonance (NMR) spectroscopy. 

5 Within the context of the invention, the terms **peal^ and "signal** refer to the signal generated by a 
biomolecule of a certain molecular mass hitting the detector of a mass spectrometer, thus generating a 
signal intensity which correlates with the amount or concentration of said biomolecule of a given 
sample. A "peak" and "signal" is defined by two values: an apparent molecular mass value and an 
intensity value generated as described. The mass value is an elemental characteristic of a biological 
10 entity, whereas the intensity value accords to a certam amount or concentration of a biological entity 
with the correspondmg parent molecular mass value, and thus **peak- and "signal" always refer to 
the properties of this biological entity. 

The term "duster'' refers to a signal or peak preset in a certain set of mass spectra or mass projSles 
15 obtained fiom di£Eerent samples belonging to two or more dififerent groups (e.g. cancer and non 
cancer). V^thin the set, signals belonging to cluster can differ in their intensities, but not in the 
apparent molecular masses. 

The term **variable" refers to a cluster which is subjected to a statistical analysis aiming towards a 
20 classification of samples into two or more different sample groups (c.g. cancer and non cancer) by 
using decision trees, wherein the sample feature relevant for classification is the mtensity value of the 
variables in the analysed samples. 

Detafled Description of the invention 

25 a) Diagnostics 

The present invention relates to methods for the differential diagnosis of colorectal cancers or a non- 
malignant disease of the large intestine by detecting one or more differentially expressed biomolecules 
within a test sample of a given subject, comparing results with samples fiom healthy subjects, subjects 
having a precancerous lesion of the large intestine, subjects having a colorectal cancer, subjects having 
30 a metastasised colorectal cancer, or subjects having a non-malignant disease of &e large intestine, 
wherein the comparison allows for the differential diagnosis of a subject as healthy, having a 
precancerous lesion of the large intestine, having a colorectal cancer, having a metastasised colorectal 
cancer or a non-malignant disease of the large intestine. 

35 In one aspect of the invention, a method for the differential diagnosis of a colorectal cancer or a non- 
malignant disease of the large intestine comprises obtaining a test sample j&om a given subject, 
contacting said sample with an adsorbent present on a biologically active surface under specific 
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binding conditions, allowing the biomolecules within the test sample to bind to said adsorbent, 
detecting one or more bound biomolecules using a detection metfiod, wherein the detection method 
generates a mass profile of said sample, transforming mass profile data into a computw-readable form 
comparing the mass profile of said sample with a database containing mass profiles from comparable 
samples specific for healthy subjects, subjects having a precancerous lesion of the large intestine, 
subjects having a colorectal cancer, subjects having a metastasised colorectal cancer, or subjects 
having a non-maUgnant disease of the large intestine. A comparison of mass profiles aUows for the 
medical practitioner to detemiine if a subject is healthy, has a precancerous lesion of the large 
intestine, a colorectal cancer, a metastasised colorectal cancer or a non-malignant disease of the large 
intestine based on the presence, absence or quantity of specific biomolecules. 

In more than one embodiment, a smgle biomolecule or a combination of more than one bi<»nolecule 
selected fiom the group havmg an apparent molecular mass of 2020 Da * 10 Da. 2049 Da ± 10 Da. 
2270 Da ± 1 1 Da, 2508 Da ± 13 Da. 2732 Da ± 14 Da. 3026 Da ± 15 Da. 3227 Da ± 17 Da, 3326 Da ± 
17 Da, 3456 Da± 17 Da, 3946 Da± 20 Da. 4103 Dai 21 Da, 4242 Da ± 21 Da, 4295 Dai 21 Da. 
4359 Da i 22 Da, 4476 Da i 22 Da, 4546 Da i 23 Da, 4607 Dai23Da,4719Dai24Da, 4830 Da i 
24 Da, 4865 Da i 24 Da, 4963 Da i 25 Da, 5 1 12 Da i 26 Da, 5226 Da ± 26 Da, 5493 Da i 27 Da. 
5648 Da i 28 Da, 5772 Da i 29 Da. 5854 Da i 29 Da, 6446 Da i 32 Da, 6644 Da i 33 Da, 6852 Da i 
34 Da, 6897 Dai 34 Da, 6999 Dai 35 Da, 7575 Dai 38 Da, 7657 Dai 38 Da, 8076 Dai 40 Da, 
8215 Dai 41 Da, 8474 Da i 42 Da, 8574 Da i 43 Da, 8702 Da i 44 Da, 8780 Da i 44 Da, 8922 Da i 
45 Da, 9078 Da i 45 Da, 9143 Da i 46 Da, 920 1 Da i 46 Da. 9359 Da i 47 Da. 9425 Da i 47 Da, 
9581 Da i 48 Da. 9641 Da i 48 Da, 9718 Da i 49 Da, 9930 Da i 50 Da. 10215 Da ± 5 1 Da. 10369 
Da i 52 Da, 10440 Da i 52 Da, 10594 Da i 53 Da, 1 1216 Da i 56 Da. 11464 Da i 57 Da, 1 1547 Da 
i 58 Da, 1 1693 Dai 58 Da, 1 1905 Dai 60 Da, 12470 Dai 62 Da, 12619 Da i 63 Da, 12828 Da i 64 
Da, 13290 Da i 66 Da, 13632 Da i 68 Da, 13784 Da ± 69 Da, 13983 Da i 70 Da, 14798 Da i 74 Da, 
15005 Da i 75 Da, 15140 Da i 76 Da, 15350 Da ± 77 Da, 15879 Da i 79 Da, 15957 Da i 80 Da, 
16104 Da i 81 Da, 16164 Da i 81 Da, 16953 Da i 85 Da, 17263 Da ± 86 Da, 17397 Da ± 87 Da, 
17617 Da i 88 Da, 17766 Da i 89 Da, 17890 Da ± 89 Da, 181 15 Da i 91 Da, 18390 Da i 92 Da, 
22338 Da i 112 Da, 22466 Da i 1 12 Da, 22676 Da i 113 Da, 2295 1 Da i 1 15 Da, 24079 Da ± 120 
Da, 28055 Da i 140 Da, or 28259 Da ± 141 Da may be detected within a given sample. Detection of a 
single or a combmation of more than one biomolecule of the invention is based on specific sample 
pre-treatment conditions, the pH of bmding conditions, and the type of biologically active surface used 
for the detection of biomolecules. For example, prior to the detection of the biomolecuies described 
herem, a given sample is pre-treated by diluting 1:5 m a denaturation buffer consisting of 7 M urea, 2 
M thiourea, 4% CHAPS. 1% DTT, and 2% ampholine. The denatured sample is then diluted 1:10 in a 
specific binding buffer (0.1 M Tris-HCl, 0.02% Triton X-100. pH 8.5), appUed to a biologically active 
surfece comprising of positively-charged quaternary ammonium groups (cationic) and incubated using 
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specific buffer conditions (0.1 M Tris-HCl, 0.02% Triton X-lOO, pH 8.5) to allow for binding of said 
biomolecules to the above-mentioned biologically active surface. 

According to the invention, a biomolecule with the molecular mass of 2020 Da ± 1 0 Da, 2049 Da ± 10 
5 Da, 2270 Da ± 1 1 Da, 2508 Da ± 13 Da, 2732 Da ± 14 Da, 3026 Da ± 15 Da, 3227 Da db 17 Da. 3326 
Da ± 17 Da, 3456 Da ± 17 Da, 3946 Da ± 20 Da, 4103 Da ± 21 Da, 4242 Da db 21 Da, 4295 Da ± 21 
Da, 4359 Da ± 22 Da, 4476 Da 22 Da, 4546 Da ± 23 Da, 4607 Da ± 23 Da, 4719 Da ± 24 Da, 4830 
Da± 24 Da, 4865 Dai 24 Da, 4963 Da ± 25 Da, 51 12 Da ± 26 Da, 5226 Da ± 26 Da. 5493 Da ± 27 
Da, 5648 Da ± 28 Da, 5772 Da ± 29 Da. 5854 Da ± 29 Da, 6446 Da ± 32 Da, 6644 Da ± 33 Da, 6852 

10 Da ±34 Da, 6897 Da ±34 Da, 6999 Da ±35 Da, 7575 Da ±38 Da, 7657 Da ±38 Da, 8076 Da ±40 
Da, 8215 Da ± 41 Da, 8474 Da ± 42 Da, 8574 Da ± 43 Da, 8702 Da ± 44 Da, 8780 Da ± 44 Da, 8922 
Da ± 45 Da. 9078 Da ± 45 Da, 9143 Da ± 46 Da, 9201 Da ± 46 Da, 9359 Da ± 47 Da, 9425 Da ± 47 
. Da. 9581 Da ± 48 Da, 9641 Da ± 48 Da, 9718 Da ± 49 Da, 9930 Da ± 50 Da, 10215 Da ± 51 Da, 
10369 Da ± 52 Da. 10440 Da ± 52 Da. 10594 Da ± 53 Da, 11216 Da ± 56 Da, 1 1464 Da ± 57 Da. 

15 1 1547 Da ± 58 Da, 1 1693 Da ± 58 Da, 1 1905 Da ± 60-Da, 12470 Da ± 62 Da. 12619 Da ± 63 Da, 
12828 Da ± 64 Da, 13290 Da ± 66 Da, 13632 Da ± 68 Da. 13784 Da ± 69 Da, 13983 Da ± 70 Da, 
14798 Da ± 74 Da, 15005 Da ± 75 Da, 15140 Da ± 76 Da, 15350 Da ± 77 Da, 15879 Da ± 79 Da, 
15957 Da ± 80 Da, 16104 Da ± 81 Da, 16164 Da ± 81 Da, 16953 Da ± 85 Da, 17263 Da ± 86 Da, 
17397 Da ± 87 Da, 17617 Da ± 88 Da, 17766 Da ± 89 Da, 17890 Da ± 89 Da, 181 15 Da ± 91 Da, 

20 18390 Da ± 92 Da, 22338 Da± 1 12 Da, 22466 Da ± 1 12 Da. 22676 Da ± 1 13 Da. 2295 1 Da ± 1 15 Da, 
24079 Da ± 120 Da, 28055 Da ± 140 Da, or 28259 Da ± 141 Da is detected by dUuting the biological 
sample 1:5 in a denatiuation buffer consisting of 7 M urea, 2 M thiourea, 4% CHAPS, 1% DTT, and 
2% Ampholine, and then 1:10 m Unding buffer consisting of 0.1 M Tris-HCl, 0.02% Triton X-100 at 
pH 8.5 at 0 to 4*'C, applying thus treated sample to a biologically active suifece comprising positively 

25 charged (cationic) quaternary ammonium groups (anion exchanging), incubatmg for 120 TniTiii»<»a at 20 
to 24''C. and subjecting the bound biomolecules to gas phase ion spectrometry as described in. another 
section. 

A biomolecule of the invention may include any molecule that is produced by a cell or living 
30 organism, and may have any biochemical property (e.g. phosphoiylated proteins, positively charged 
molecules, negatively charged molecules, hydrophobicity, hydrophilicity), but preferably biochemical 
properties that allow binding of the biomolecule to a biologically active sur&ce comprising positively 
charged quaternary ammonium groups after denaturation in 7 M urea, 2 M thiourea, 4% CHAPS, 1% 
DTT, and 2% Ampholine and dilution in 0.1 M Tris-HCl, 0.02% Triton X-100 at pH 8.5 at 0 to. 4''C 
35 followed by incubation on said .biologically active sxirfece for 120 minutes at 20 to 24**C. Such 
molecules include, but are not limited to, molecules comprising nucleotides, amino acids, sugars, fetty 
acids, steroids, nucleic acids, , polynucleotides (DNA or RNA), polypqrtides, proteins, antibodies. 
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caibohydiates, lipids, and combinations thereof (e.g., glycoproteins, ribonucleoproteins, lipoproteins). 
Preferably a biomolecule may be a nucleotide, polynucleotide, peptide, protein or fragments thereof. 
Even more preferred are peptide or protein biomolecules or fragments thereof. 

The methods for detecting these biomolecules have many applications. For example, a single 
biomolecule or a combination of more than one biomolecule selected from the group having an 
apparent molecular mass of 2020 Da ± 10 Da, 2049 Da ± 10 Da, 2270 Da ± 11 Da, 2508 Da ± 13 Da, 
2732 Da ± 14 Da, 3026 Da ± 15 Da, 3227 Da ± 17 Da, 3326 Da ± 17 Da, 3456 Da ± 17 Da, 3946 Da ± 
20 Da, 4103 Da ± 21 Da. 4242 Da ± 21 Da, 4295 Da ± 21 Da. 4359 Da ± 22 Da, 4476 Da ± 22 Da. 
4546 Da ± 23 Da, 4607 Da ± 23 Da, 47 19 Da ± 24 Da, 4830 Da ± 24 Da, 4865 Da ± 24 Da, 4963 Da ± 
25 Da, 5 1 12 Da ± 26 Da, 5226 Da ± 26 Da, 5493 Da ± 27 Da, 5648 Da ± 28 Da, 5772 Da ± 29 Da. 
5854 Da ± 29 Da. 6446 Da ± 32 Da, 6644 Da ± 33 Da, 6852 Da ± 34 Da, 6897 Da ± 34 Da, 6999 Da ± 
35 Da, 7575 Da ± 38 Da, 7657 Da ± 38 Da, 8076 Da ± 40 Da, 8215 Da ± 41 Da, 8474 Da ± 42 Da^ 
8574 Da ± 43 Da, 8702 Da ± 44 Da, 8780 Da ± 44 Da, 8922 Da ± 45 Da, 9078 Da ± 45 Da, 9143 Da ± 
46 Da, 9201 Da ± 46 Da, 9359 Da ± 47 Da, 9425 Da ± 47 Da, 9581 Da ± 48 Da. 9641 Da ± 48 Da, 
9718 Da±49 Da, 9930 Da±50 Da, 10215 DaiSlDa, 10369 Da±52 Da, 10440 Da±52 Da, 10594 
Da±53Da, 11216 Da± 56 Da, 11464 Da ±57 Da, 11547 Dai 58 Da. 11693 Da ± 58 Da, 11905 Da 
±60 Da, 12470 Da ±62 Da. 12619 Da ±63 Da, 12828 Da ± 64 Da, 13290 Da±66 Da, 13632 Da±68 
Da. 13784 Da ± 69 Da, 13983 Da ± 70 Da, 14798 Da± 74 Da. 15005 Da ± 75 Da. 15140 Da± 76 Da, 
15350 Da ± 77 Da, 15879 Da ± 79 Da. 15957 Da ± 80 Da, 16104 Da ± 81 Da, 16164 Da ± 81 Da, 
16953 Da ± 85 Da. 17263 Da ± 86 Da. 17397 Da ± 87 Da, 17617 Da ± 88 Da, 17766 Da ± 89 Da, 
17890 Da±89Da,18115Da±91Da, 18390 Da ± 92 Da. 22338 Da ± 1 12 Da. 22466 Da ± 1 12 Da, 
22676 Da ± 1 13 Da, 22951 Da ± 1 15 Da, 24079 Da ± 120 Da, 28055 Da ± 140 Da. or 28259 Da ± 141 
Da can be measvoced to differentiate between healthy subjects, subjects having a precancerous lesion of 
the large intestine, subjects having colorectal cancer, subjects having a m^astasized colorectal cancer 
or subjects with a non-malignant disease of the large intestine, and thus are useful as an aid in the 
diagnosis of a colorectal cancer and/or a non-malignant disease of the large intestine within a subject. 
Alternatively, said biomolecules may be used to diagnose a subject as healthy. 

For example, a biomolecule having the apparent molecular mass of about e.g. 4242 Da is present only 
in biological samples from patients having a metastasised colorectal cancer. Mass profiling of two test 
samples from different subjects, X and Y, reveals the presence of a biomolecule with the apparent 
molecular mass of about 4S42 Da hi a sample from test subject X, and the absence of said biomolecule 
m.test sainple from subject Y. The medical practitioner is able to diagnose subject X as having a 
metastasised colorectal cancer and subject Y as not having a metastasised colorectal cancer. In yet 
another example, three biomolecules having the apparent molecular mass of about 5772 Da, 2020 Da 
and 22951 Da are present in varying quantities in samples specific for precancerous lesions and 
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"early colorectal cancers. The biomolecule having the apparent molecular mass of 5772 Da is more 
present in samples specific for precancerous lesions of the large intestine than for '•early^ colorectal 
cancers. A biomolecule having an apparent molecular mass of 2020 Da is detected in samples from 
subjects having "early" colorectal cancers but not in those having a precancerous lesion, whereas the 
biomolecule having the molecular mass of 22951 Da is present in about the same quantity in both 
sample types. Such biomolecules are not presait in samples from healthy subjects, only those of 
apparent molecular mass of 8780 Da and 16104 Da. Analysis of a t^t sample reveals the presence of 
biomolecules having the molecular mass of 22951 Da, 5772 Da and 2020 Da. Comparison of the 
quantity of the biomolecules within said sample reveals ^t the biomolecule with an apparent 
molecular mass of 5772 Da is present at lower levels than those found in samples from subjects having 
a precancerous lesion. The medical practitioner is able to diagnose the test subject as having an **early^ 
colorectal cancer. These examples are solely used for the purpose of clarification and are not intended 
to limit the scope of this invention. 

' In another aspect of the invention, an immimoassay can be used to determine the presex^:e or absence 
of a biomolecule within a test sample of a subject. First, tiie presence or absence of a biomolecule 
within a sample can be detected using the various immunoassay methods known to those skilled in the 
art (i.e. ELIS A, western blots). If a biomolecule is present in the test sample, it will form an antibody- 
maiker complex with an antibody that specifically binds a biomolecule under suitable incubation 
conditions. The amount of an antibody-biomolecule complex can be determined by comparing to a 
standard. 

The invention provides a method for the differential diagnosis of a colorectal cancer and/or a non- 
malignant disease of the large intestine comprising detecting of one or more differentially expressed 
biomolecules within a sample. This method comprises obtaining a test sample from a subject, 
contacting said sample with a binding molecule specific for a differentially e7q>ressed polypeptide, 
detecting an interaction between the binding molecule and its specific polypeptide, wherein the 
detection of an interaction indicates the presence or absence of said polypeptide, thereby allowing for 
the differential diagnosis of a subject as healthy, having a precancerous lesion of the large intestine, 
having a colorectal cancer, having a metastasised colorectal cancer and/or a non-maUgnant disease of 
the large intestine. Binding molecules include, but are not limited to, proteins, peptides, nucleotides, 
nucleic acids, hormones, amino acids, sugars, fatty acids, steroids, polynucleotides, carbohydrates, 
lipids, or a combination therrof (e.g. glycoproteins, ribonucleoproteins, lipoproteins), compounds or 
synthetic molecules. Preferably, binding molecules are' antibodies specific for biomolecules selected 
from the group of having an apparent molecular mass of 2020 Da± 10 Da, 2049 Da± 10 Da, 2270 Da 
± 11 Da, 2508 Dai 13 Da, 2732 Da± 14 Da, 3026 Dai 15 Da, 3227 Da± 17 Da, 3326 Da± 17 Da, 
3456 Da ± 17 Da, 3946 Da ± 20 Da, 4103 Da ± 21 Da, 4242 Da ± 21 Da, 4295 Da ifc 21 Da, 4359 Da db 
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22 Da, 4476 Da db 22 Da, 4546 Da ± 23 Da, 4607 Da ± 23 Da, 4719 Da ± 24 Da, 4830 Da ± 24 Da, 
4865 Da ± 24 Da, 4963 Da ± 25 Da, 5 1 12 Da ± 26 Da, 5226 Da ± 26 Da, 5493 Da ± 27 Da, 5648 Da ± 
28 Da, 5772 Da ± 29 Da, 5854 Da ± 29 Da, 6446 Da ± 32 Da, 6644 Da ± 33 D»a, 6852 Da ± 34 Da, 
6897 Da ± 34 Da, 6999 Da ± 35 Da, 7575 Da ± 38 Da, 7657 Da ± 38 Da, 8076 Da ± 40 Da, 8215 Da ± 
5 41 Da, 8474 Da ± 42 Da, 8574 Da ± 43 Da, 8702 Da ± 44 Da, 8780 Da ± 44 Da, 8922 Da ± 45 Da, 
9078 Da ± 45 Da, 9143 Da ± 46 Da, 9201 Da db 46 Da, 9359 Da dt 47 Da, 9425 Da ± 47 Da, 9581 Da ± 
48 Da, 9641 Da ± 48 Da, 9718 Da ± 49 Da, 9930 Da ± 50 Da, 10215 Da db 51 Da, 10369 Da ± 52 Da, 
10440 Da ± 52 Da, 10594 Da db 53 Da, 11216 Da ± 56 Da, 11464 Da ± 57 Da, 11547 Da ± 58 Da, 
1 1693 Da ± 58 Da, 1 1905 Da ± 60 Da, 12470 Da ± 62 Da, 12619 Da db 63 Da, 12828 Da db 64 Da, 

10 13290 Da ± 66 Da, 13632 Da ± 68 Da, 13784 Da ± 69 Da, 13983 Da ± 70 Da, 14798 Da ± 74 Da, 
15005 Da ± 75 Da, 15140 Da ± 76 Da, 15350 Da ± 77 Da, 15879 Da ± 79 Da, 15957 Da ± 80 Da, 
16104 Da db 81 Da, 16164 Da ± 81 Da, 16953 Da ± 85 Da, 17263 Da ± 86 Da, 17397 Da ± 87 Da, 
17617 Da ± 88 Da, 17766 Da ± 89 Da, 17890 Da ± 89 Da, 18115 Da db 91 Da, 18390 Da db 92 Da, 
22338 Da ± 112 Da, 22466 Da ± 112 Da, 22676 Da ± 113 Da, 22951 Da ± 115 Da, 24079 Da ± 120 

IS Da, 28055 Da± 140 Da. or 28259 Da ±141 Da 

In anotho' aspect of the inveotion, a method for detecting the differential presence of one or more 
biomolecoles selected from the group having an apparent molecular mass of 2020 Da ± 10 Da, 2049 
Da i: 10 Da, 2270 Da db 11 Da, 2508 Da ± 13 Da, 2732 Da db 14 Da, 3026 Da ± 15 Da, 3227 Da ± 17 

20 Da, 3326 Da db 17 Da. 3456 Da ± 17 Da, 3946 Da ± 20 Da, 4103 Da db 21 Da, 4242 Da db 21 Da, 4295 
Dadb21 Da, 4359Da± 22 Da, 4476 Da± 22 Da, 4546 Da db 23 Da, 4607 Da± 23 Da, 4719 Da ± 24 
Da, 4830 Da d: 24 Da. 4865 Da ± 24 Da. 4963 Da d: 25 Da, 5 1 1 2 Da ± 26 Da. 5226 Da ± 26 Da, 5493 
Da ± 27 Da, 5648 Da ± 28 Da, 5772 Da ± 29 Da, 5854 Da db 29 Da, 6446 Da ± 32 Da, 6644 Da ± 33 
Da, 6852 Da ± 34 Da, 6897 Da ± 34 Da. 6999 Da ± 35 Da, 7575 Da ± 38 Da, 7657 Da db 38 Da, 8076 

25 Da ± 40 Da, 8215 Da ± 41 Da, 8474 Da db 42 Da, 8574 Da ± 43 Da, 8702 Da ± 44 Da, 8780 Da ± 44 
Da, 8922 Da ± 45 Da, 9078 Da ± 45 Da, 9143 Da ± 46 Da, 9201 Da ± 46 Da, 9359 Da ± 47 Da, 9425 
Da db 47 Da, 9581 Da± 48 Da. 9641 Da d: 48 Da, 9718 Da d: 49 Da, 9930 Da db 50 Da, 10215 Dad: 51 
Da, 10369 Dad: 52 Da, 10440 Da d: 52 Da, 10594 Da ±53 Da, 1121 6 Da ±56 Da, 11464 Da± 57 Da, 
11547 Da±58 Da, 11693 Da ± 58 Da, 1 1905 Da ± 60 Da, 12470 Da ± 62 Da. 12619 Da ± 63 Da. 

30 12828 Da ± 64 Da, 13290 Da ± 66 Da, 13632 Da ± 68 Da, 13784 Da ± 69 Da, 13983 Da ± 70 Da, 
14798 Da ± 74 Da, 15005 Da ± 75 Da, 15140 Da ± 76 Da, 15350 Da ± 77 Da, 15879 Da ± 79 Da, 
15957 Da ± 80 Da, 16104 Da .± 81 Da, 16164 Da ± 81 Da, 16953 Da ± 85 Da, 17263 Da ± 86 Da, 
17397 Da ± 87 Da, 17617 Da ± 8§ Da, 17766 Da ± 89 l5a, 17890 Da ± 89 Da. 18115 Da ± 91 Da, 
18390 Da ± 92 Da, 22338 Da ± 1 12 Da, 22466 ba± 1 12 Da, 22676 Da ± 1 13 Da, 22951 E)a ± 115* Da, 

35 24079 Da ± 120 Da, 28055 Da ± 140 Da. or 28259 Da ± 141 Da in a test sample of a subject involves 
contacting the test sample with a compound or agent capable of detecting said biomolecule such that 
the presence of said biomolecule is directly and/or indirectly labelled. For example a fluorescently 
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labeUed secondary antibody can be used to detect a primary antibody bound to its specific 
biomolecule. Furtfaeimore, such detection methods can be used to detect a variety of biomolecules 
within a test sample both in vi7ro as well as in vrvo. 

5 Li more than one embodiment of the invention, the test sample used for the differential diagnosis of a 
colorectal cancer and/or a mm-malignant disease of the large intestine of a subject may be of blood, 
blood serum, plasma, nipple aspirate, urine, semen, satninal fluid, seminal plasma, prostatic fluid, 
excreta, tears, saliva, sweat, biopsy, ascites, cerebrospmal fluid, milk, lymph, or tissue extract origin. 
Preferably, test samples are of blood, blood serum, plasma, urine, excreta, prostatic fluid, biopsy, 
10 ascites, lymph or tissue extract origin. More preferred are blood, blood serum, plasma, urine, excreta, 
biopsy, lymph or tissue extract samples. Even more preferred are blood serum, urine, excreta or biopsy 
samples. Overall preferred are blood serum samples. 

Furthermore, test samples used for the methods of the invention are isolated &om subjects of 
1 5 mammaUan origin, preferably of primate origin. Even more prefened are subjects of human origin. 

In addition, the methods of the invention for the differential diagnosis of healthy subjects, subjects 
having a precancerous lesion of the large intestme, subjects having a colorectal cancer, subjects having 
a metastasized colorectal cancer or subjects having a non-malignant disease of the large intestine . 
20 described herem may be combined with other diagnostic methods to improve the outcome of the 
diff^^tial diagnosis. Other diagnostic methods are known to those skilled in the art 

h) Database 

In another aspect of the invention, a database comprising of mass profiles specific for healthy subjects, 
25 subjects having a precancerous lesion of the large intestine, subjects havmg a colorectal cancer, 
subjects having a metastasised colorectal cancer, or subjects having a non-malignant disease of the 
large intestine is generated by contacting biological samples isolated from above-mentioned subjects 
with an adsorbent on a biologically active surface imder specific binding conditions, allowing the 
biomolecules within said sample to bind said adsorbent, detecting one or more bound biomolecides 
30 using a detection method wherein the detection method generates a mass profile of said sample, 
transforming the mass profile data into a computer-readable form and applying a mathematical 
algorithm to classify the mass profile as specific for healthy subjects, subjects haviag a precancerous 
lesion of the large intestine, subjects having a^ colorectal cancer, subjects having a metastasised 
colorectal cancer, or subjects having a non-malignant disease of the large intestme. 
35 .... • . • - . • • • ■ " 

According to the invention, the classification of said mass profiles is perfonned using the "CART" 
decision tree approach (classification and regression trees; Breiman et al., 1984) and is known to those 
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skilled in the ait. Furthennore, bagging of classifiers is applied to overcome typical instabilities of 
forward variable selection procedures, thereby increasing overall classifier performance (Breiman, 
1994). 

In more than one embodiment, one or mote biomolecules selected firom the group having an apparent 
molecular mass of 2020 Da ± 10 Da, 2049 Da ± 10 Da, 2270 Da ± 11 Da, 2508 Da ± 13 Da, 2732 Da 
± 14 Da, 3026 Da ± 15 Da, 3227 Da ± 17 Da, 3326 Da ± 17 Da, 3456 Da ± 17 Da, 3946 Da ± 20 Da, 
4103 Da ± 2 1 Da, 4242 Da ± 2 1 Da, 4295 Da ± 21 Da, 4359 Da ± 22 Da. 4476 Da ± 22 Da, 4546 Da ± 
23 Da, 4607 Da ± 23 Da, 4719 Da ± 24 Da, 4830 Da ± 24 Da, 4865 Da ± 24 Da, 4963 Da ± 25 Da. 
5 1 12 Da ± 26 Da, 5226 Da ± 26 Da, 5493 Da ± 27 Da, 5648 Da ± 28 Da, 5772 Da ± 29 Da. 5854 Da ± 
29 Da, 6446 Da ± 32 Da, 6644 Da ± 33 Da, 6852 Da ± 34 Da, 6897 Da ± 34 Da, 6999 Da ± 35 Da, 
7575 Da ± 38 Da. 7657 Da ± 38 Da, 8076 Da ± 40 Da, 8215 Da ± 41 Da, 8474 Da ± 42 Da, 8574 Da ± 
43 Da, 8702 Da ± 44 Da. 8780 Da ± 44 Da, 8922 Da ± 45 Da, 9078 Da ± 45 Da, 9143 Da ± 46 Da, 
9201 Da±46 Da, 9359 Da± 47 Da, 9425 Da±47 Da. 9581 Da±48 Da. 9641 Dai 48 Da. 9718 Da± 
49 Da. 9930 Da ± 50 Da, 10215 Da ± 5 1 Da. 10369 Da ± 52 Da, 10440 Da ± 52 Da. 10594 Da ± 53 
Dai 11216 Da ±56 Da, 1 1464 Da ± 57 Da, 1 1547 Da ± 58 Da, 1 1693 Da ± 58 Da, 1 1905 Da ± 60 Da, 
12470 Da ± 62 Da, 12619 Da ± 63 Da, 12828 Da ± 64 Da, 13290 Da ± 66 Da. 13632 Da i 68 Da, 
13784 Da ± 69 Da, 13983 Da ± 70 Da, 14798 Da ± 74 Da, 15005 Da ± 75 Da. 15140 Da ± 76 Da. 
15350 Da ± 77 Da, 15879 Da ± 79 Da, 15957 Da ± 80 Da, 16104 Da ± 81 Da. 16164 Da ± 81 Da, 
16953 Da ± 85 Da, 17263 Da ± 86 Da. 17397 Da ± 87 Da, 17617 Da ± 88 Da. 17766 Da ± 89 Da, 
17890 Da ±89 Da. 18115 Da ±91 Da, 18390 Da ± 92 Da, 22338 Da ± 112 Da, 22466 Da ± 112 Da, 
22676 Da± 113 Da, 22951 Da± 115 Da, 24079 Da± 120 Da. 28055 Da± 140 Da, or 28259 Da± 141 
Da may be detected within a given biological sample. Detection of said biomolecules of the invention 
is based on specific sample pre-treatment conditions, the pH of binding condilians, and the type of 
biologically active sur&ce used for ttie detection of biomolecules. 

Withm the context of the invention, biomolecules withm a given sample are bound to an adsorbent on 
a biologicaUy active surfece under specific binding conditions, for example, the biomolecules within a 
given sample are appKed to a biologically active surface comprising positively-charged quaternary 
ammonium groups (cationic) and incubated with 0.1 M Tris-HCl. 0.02% Triton X-100 at a pH of 8.5 
to aUow for specific binding. Biomblecules that bind to said biologicaUy active surface under these 
conditions are negatively charged molecules. It should be noted that although the biomolecules of the 
invention are bound to a cationic adsorbent'comprising of positively-charged quaternary ammonium 
groups, the biomoleciiles are capable of bindmg other types of adsorbents, as described in another 
lection usmg binding conditions known to those skiUed in the art Accordingly, some embodiments of 
the invention are not limited to the use of cationic adsorbents 
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According to the invention, a biomolecule with the molecular mass of 2020 Da ± 10 Da, 2049 Da ± 10 
Da, 2270 Da ± 1 1 Da, 2508 Da ± 13 Da, 2732 Da ± 14 Da, 3026 Da ± 15 Da, 3227 Da ± 17 Da, 3326 
Da ± 17 Da, 3456 Dai 17 Da, 3946 Da ± 20 Da, 4103 Da ± 21 Da, 4242 Da ± 21 Da, 4295 Da ± 21 
Da, 4359 Da ± 22 Da, 4476 Da ± 22 Da, 4546 Da ± 23 Da, 4607 Da ± 23 Da, 4719 Da ± 24 Da, 4830 
Da db 24 Da, 4865 Da ± 24 Da, 4963 Da ± 25 Da, 5 1 12 Da ± 26 Da, 5226 Da ± 26 Da, 5493 Da ± 27 
Da, 5648 Da ± 28 Da, 5772 Da ± 29 Da, 5854 Da ± 29 Da, 6446 Da ± 32 Da, 6644 Da ± 33 Da, 6852 
Da ± 34 Da, 6897 Da ± 34 Da, 6999 Da ± 35 Da, 7575 Da ± 38 Da, 7657 Da ± 38 Da, 8076 Da ± 40 
Da, 82 15 Da ± 41 Da, 8474 Da ± 42 Da, 8574 Da ± 43 Da, 8702 Da ± 44 Da, 8780 Da ± 44 Da, 8922 
Da ± 45 Da, 9078 Da ± 45 Da, 9143 Da ± 46 Da, 9201 Da ± 46 Da, 9359 Da ± 47 Da, 9425 Da ± 47 
Da, 9581 Da ± 48 Da, 9641 Da ± 48 Da, 9718 Da ± 49 Da, 9930 Da ± 50 Da, 10215 Da ± 51Da. 
10369 Da ± 52 Da. 10440 Da ± 52 Da, 10594 Da ± 53 Da, 11216 Da ± 56 Da, 11464 Da ± 57 Da, 
11547 Da ± 58 Da, 11693 Da ± 58 Da, 1 1905 Da ± 60 Da, 12470 Da ± 62 Da, 12619 Da ± 63 Da, 
12828 Da ± 64 Da, 13290 Da ± 66 Da, 13632 Da ± 68 Da, 13784 Da ± 69 Da, 13983 Da ± 70 Da, 
14798 Da ± 74 Da, 15005 Da ± 75 Da, 15140 Da ± 76 Da, 15350 Da ± 77 Da, 15879 Da ± 79 Da, 
15957 Da ± 80 Da, 16104 Da ± 81 Da. 16164 Da ± 81 Da, 16953 Da ± 85 Da, 17263 Da ± 86 Da, 
17397 Da ± 87 Da, 17617 Da ± 88 Da, 17766 Da ± 89 Da, 17890 Da ± 89 Da, 181 15 Da ± 91 Da, 
18390 Da ± 92 Da, 22338 Da ± 1 12 Da, 22466 Da ± 1 12 Da, 22676 Dai 1 13 Da, 22951 Dai 1 15 Ehi, 
24079 Da i 120 Da, 28055 Da i 140 Da, or 28259 Da i 141 Da is detected by dihiting the biological 
sample 1 :5 in a denatuiadon buffer consisting of 7 M urea, 2 M tiuourea, 4% CHAPS, 1% DTT, and 
2% Ampholine, and then 1:10 in bmding buffer consisting of 0.1 M Tris-HCl. 0.02% Triton X-lOO at 
pH 8.5 at 0 to 4°C, applying thus treated sample to a biologically active sur&ce C(»xq;)ri8ing positively 
charged (cationic) quatematy ammonium groups (anion exchanging),, incubating for 120 minutes at 20 
to 24"*C, and subjecting the bound biranolecules to gas phase ion ^ectrometry as described in another 
section. 
25 

In one embodiment of the invention, biological samples used to generate a database of mass profiles 
for healthy subjects, subjects havhug a precancerous lesion of the large intestine, subjects having a 
colorectal cancer, sultjects having a metastasised colorectal cancer or subjects having a non- malignant 
disease of the large intestine, may be of blood, blood serum, plasma, nipple aspirate, urine, semen, 

30 seminal fluid,^seminal plasma, prostatic fluid, excreta, tears, saliva, sweat, biopsy, ascit^, 
ceretoospinal fluid, milk, lymph, or tissue extract origin. Preferably, biological sampl^ are of blood, 
blood serum, plasma, urine, excreta, prostatic fluid, biopsy, ascites, lymph or tissue extract origin. 
Mote prdfened are blood, blood- sCTum, plasma, urine, excreta, biopsy, lymph or tissue extract 
samples. iBven more preferred are blood serum, urine, excreta or biopsy samples. Overall preferred are 

35 * blood serum samples. 
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Furthermore, the biological samples related to the invention are isolated from subjects considered to 
be healthy, having a precancerous lesion of the large intestine* having a colorectal cancer, having a 
metastasised colorectal cancer or having a non-malignant disease of the large intestine. Said subjects 
are of mammalian origin, preferably of primate origin. Even more preferred are subjects of himian 
5 origin. 

A subject of the invention that is said to have a precancerous lesion of the large intestine, displays 
preliminary stages of a cancer (i.e. dysplasia), wherein a cell and/or tissue has become susceptible to 
the development of a cancer as a result of either a genetic predisposition, exposure to a cancer-causing 
10 agent (carcinogen) or both. 

A genetic pre-disposition may include a predisposition for an autosomal dominant inherited cancer 
syndrome which is generally indicated by a strong family history of uncommon cancer and/or an 
association with a specific marker phenotype (e.g. femilial adenomatous polyps of the colon), a 

IS familial cancer wherein an evident clustering of cancer is observed but the role of inherited 
predisposition may not be clear (e,g. breast cancer, ovarian cancer, or colon cancer), or an autosomal 
recessive syndrome characterised by chromosomal or DNA instability. Whereas, cancer-causing 
agents include agents that cause genetic damage and induce neoplastic transformation of a cell. Such 
agents fidl into three categories: 1) chemical carcinogens such as alkylating agents, polycyclic 

20 aromatic hydrocarbons, aromatic amines, azo dyes, nitrosamines and amides, asbestos, vinyl chloride, 
chromium, nickel, arsenic, and naturally occurring carcinogens (e.g. aflotoxin Bl); 2) radiation such as 
ultraviolet (UV) and ionisation radiation including electromagnetic (e.g. x-rays, y-rays) and particulate 
radiation (e.g. a and p particles, protons, neutrons); 3) viral and microbial carcinogens such as human 
Papillomavirus (HPV), Epsteia-Barr virus (EBV), hepatitis B virus (HBV), human T-cell leukaemia 

25 virus type 1 OHTLV-1), or Helicobacter pylori. 

Alternatively, a subject within the invaition that is said to have a colorectal cancer possesses a cancer 
that arises from the large intestine (interchangebly referred to as colorectal cancers within the 
invention). Such cancers may include, but are not limited to, colon and rectal cancers. 

30 . . 

Within the context of the invention, cancers of large intestine (interchangebly referred to as colorectal 
cancers within the invention) may also be of various stages, wherein the staging is based on the size of 
the primary lesion,* its extent of spread to regional lymph nodes, and the presence or absence of 
blood-borne metastases (metastatic colorectal cancers. The various stages of a cancdr m^y be 

35 identified using staging systems known to those skilled in the art [e.g. Union Internationale Contie 
Cancer (UICC) system or American Joint Committee on Cancer (AJC)]. Also included are different 
grades of said cancers, whereiu the grade of a cancer is based on the degree of difierentiation of the 
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epithelial cells within the lining of the large intestine and the number of mitoses as a correlation to 
neoplasm's aggression. 



Healthy individuals, as related to certain embodiments of the invention, are those that possess good 
S health, and demonstrate an absence of a colorectal cancer or a non-malignant disease of the large 
intestine. 

c) Biomolecules 

The differential expression of biomolecules in samples firom healthy subjects, subjects having a 
10 precancerous lesion of the large intestine, subjects having a colorectal cancer, subjects having 
metastasised colorectal cancer, and subjects having a non-malignant disease of the large intestine, 
allows for the differential diagnosis of a non-malignant disease or a cancer of the large intestine wihin 
a subject 

IS Biomolecules are said to be specific for a particular clinical state (e.g. healthy, precancerous lesion of 
the large intestine, colorectal cancer, metastasised colorectal cancer, a non-malignant disease of the 
large mtestine) when they are present at different levels within samples taken from subjects in one 
clinical state as compared to samples taken jQx>m subjects from other clinical states (e.g. in subjects 
with a precancerous lesion of the large intestine vs. in subjects with a metastasised colorectal cancer). 

20 Biomolecules may be present at elevated levels, at decreased levels, or altogether absent within a 
sample taken &om a subject in a particular clinical state (e.g. healthy, precancerous lesion of the large 
intestine, colorectal cancer,. metastasised colorectal cancer, a non-malignant disease of the large 
intestine). For example, biomolecules A and B are found at elevated levels in samples isolated from 
healthy subjects as compared to samples isolated from subjects having a precancerous lesion of the 

25 large intestine, a colorectal cancer, a metastatic colorectal cancer or a non-malignant disease of the 
large intestine. Whereas, biomolecules X, Y, Z are found at elevated levels and/or more firequently in 
samples isolated from subjects having a precancerous lesion of the large intestine as opposed to 
subjects in good health, having a colorectal cancer, a metastasised colorectal cancer or a non- 
malignant disease of the large intestine. Biomolecules A and B are said to be specific for healthy 

30 subjects, whereas biomolecules X, Y, Z are specific for subjects having a precancerous lesion of the 
large intestine. 

Accordingly, the differential presence of pne or more biomolecules found in a test sample compared to 
samples from healthy subjects, subjects with a precancerous lesion of the large intestine, a colorectal 
35 cancer, a metastasized colorectal canc^, or a non-^malignant disease of the large intestine, or the mere 
detection of one or more biomolecules in the test sample provides useful information regarding 
probability of whether a subject being tested has a precancerous lesion of the large intestine, a 
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colorectal cancer, a metastasized colorectal cancer or a non-maligoant disease of the large intestine. 
The probability that a subject being tested has a precancerous lesion of the large intestine, a colorectal 
cancer, a metastasized colorectal cancer or a non-m a lignant disease of the large intestine depends on 
whether the quantity of one or more biomolecules in a test sample taken- from said subject is 
statistically significantly different from the quantity of one or more biomolecules in a biological 
sample taken from healthy subjects, subjects having a precancerous lesion of the large intestine, a 
colorectal cancer, a metastasised colorectal cancer, or a non-malignant disease of the lar^ intestine. 

A biomolecule of the invention may be any molecule that is produced by a cell or living organism, and 
may have any biochemical property (e.g. phosphorylated proteins, positively charged molecules, 
negatively charged molecules, hydrophobicity, hydrophilicity), but preferably biochemical properties 
that allow binding of the biomolecule to a biologically active surface comprising positively charged 
quaternary ammonium groups after denatuiation in 7 M urea, 2 M thiourea, 4% CHAPS, 1% DTT, and 
2% Ampholine and dilution in 0.1 M Tris-HCl, 0.02% Triton X-100 at pH 8^ at 0 to 4**C foUowed by 
incubation on said biologically active surfece for 120 minutes at 20 to 24**C. Such molecules include, 
but are not limited to, molecules comprising nucleotides, amino acids, sugars, fatty acids, steroids, 
nucleic acids, polynucleotides (DNA or RNA), polypeptides, proteins, antibodies, carbohydrates, 
lipids, and combinations thereof (e.g., glycoproteins, ribonucleoprotems, lipoproteins). Preferably a 
biomolecule may be a nucleotide, polynucleotide, peptide, protein or fragments thereof. Even more 
preferred are peptide or protein biomolecules. 

The biomolecules of the invention can be detected based on specific sample pre-treatment conditions, 
the pH of binding conditions, the type of biologically active surfsace used for the detection of 
biomolecules within a given sample and their molecular mass. For example, prior to the detection of 
the biomolecules described herein, a given sample is pre-treated by diluting 1:5 m a denaturation 
buffer consisting of 7 M urea, 2 M thiourea, 4% CHAPS, 1% DTT, and 2% ampholine. The denatured 
sample is then diluted 1:10 in 0.1 M Tris-HCl, 0,02% Triton X-100, pH 8.5, applied to a biologically 
active surfece comprising positively-charged quaternary ammonium groups (cationic) and incubated 
using specific buffer conditions (0.1 M Tris-HCl, 0.02% Triton X-100, pH 8.5) to allow for binding of 
said biomolecules to the above-mentioned biologically active surface. It should be noted that although 
the biomolecules of the invention are detected using a cationic adsorbent positively charged 
quaternary ammoniimi groups, as well as specific pre-treatment and binding conditions, the 
biomolecules are capable of binding other types of adsorbents, as described below, using alternative 
pre-treatment and binding conditions known to those sldlled in the art. Accordingly, some 
embodiments of the invention are not iimited to the use of cationic adsorbents. 
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The biomolecules of the invention include biomolecules having a molecular mass selected from the 
group consisting of 2020 Da ± 10 Da, 2049 Da ± 10 Da, 2270 Da ± 1 1 Da, 2508 Da ± 13 Da, 2732 Da 
± 14 Da, 3026 Da ± 15 Da, 3227 Da ± 17 Da, 3326 Da ± 17 Da, 3456 Da ± 17 Da, 3946 Da ± 20 Da, 
4103 Da ± 21 Da, 4242 Da ± 2 1 Da, 4295 Da ± 2 1 Da, 4359 Da ± 22 Da, 4476 Da ± 22 Da, 4546 Da ± 
23 Da, 4607 Da ± 23 Da, 4719 Da ± 24 Da; 4830 Da db 24 Da, 4865 Da ± 24 Da, 4963 Da ± 25 Da, 
51 12 Da ± 26 Da, 5226 Da ± 26 Da, 5493 Da ± 27 Da, 5648 Da ± 28 Da, 5772 Da ± 29 Da, 5854 Da ± 
29 Da, 6446 Da ± 32 Da, 6644 Da ± 33 Da, 6852 Da ± 34 Da, 6897 Da ± 34 Da, 6999 Da ± 35 Da, 
7575 Da ± 38 Da, 7657 Da ± 38 Da, 8076 Da ± 40 Da, 8215 Da ± 41 Da, 8474 Da ± 42 Da, 8574 Da ± 
43 Da, 8702 Da ± 44 Da, 8780 Da ± 44 Da, 8922 Da ± 45 Da, 9078 Da ± 45 Da, 9143 Da ± 46 Da, 
9201 Dai 46 Da, 9359 Da ± 47 Da, 9425 Da ± 47 Da, 9581 Da ± 48 Da, 9641 Da ± 48 Da, 9718 Da ± 
49 Da, 9930 Da ± 50 Da, 10215 Da ± 51 Da, 10369 Da ± 52 Da, 10440 Da ± 52 Da, 10594 Da ± 53 
Da, 11216 Dai 56 Da, 11464 Dai 57 Da, 1 1547 Da i 58 Da, 1 1693 Da ± 58 Da, 11905 Dai 60 Da, 
12470 Da i 62 Da, 12619 Da i 63 Da, 12828 Da i 64 Da, 13290 Da i 66 Da, 13632 Da i 68 Da, 
13784 Da i 69 Da, 13983 Da i 70 Da» 14798 Da ± 74 Da, 15005 Da i 75 Da, 15140 Da ± 76 Da, 
15350 Da i 77 Da. 15879 Da i 79 Da, 15957 Da i 80 Da, 16104 Da i 81 Da, 16164 Da ± 81 Da, 
16953 Da i 85 Da, 17263 Da i 86 Da, 17397 Da i 87 Da, 17617 Da i 88 Da, 17766 Da ± 89 Da, 
17890 Dai 89 Da, 18115 Dai 91 Da, 18390 Dai 92 Da, 22338 Dai 112 Da, 22466 Dai 112 Da, 
22676 Da i 1 13 Da, 22951 Da i 1 15 Da, 24079 Da i 120 Da, 28055 Da i 140 Da,'or 28259 Da i 141 
Da. 

According to the invCTtion, a Uomolecule with the molecular mass of 2020 Da i 10 Da, 2049 Da i 10 
Da, 2270 Da i 1 1 Da, 2508 Da i 13 Da, 2732 Da i 14 Da, 3026 Da i 1 5 Da, 3227 Da i 17 Da, 3326 
Dai 17 Da, 3456 Da i 17 Da, 3946 Da i 20 Da. 4103 Da i 21 Da, 4242 Da i 21 Da, 4295 Da i 21 
Da, 4359 Da i 22 Da. 4476 Da i 22 Da. 4546 Da i 23 Da, 4607 Da i 23 Da, 4719 Da i 24 Da, 4830 
Dai 24 Da, 4865 Dai 24 Da, 4963 Da i 25 Da. 5112 Da i 26 Da, 5226 Da i 26 Da, 5493 Da i 27 
Da, 5648 Dai 28 Da, 5772 Dai 29 Da, 58541>ai29 Da, 6446 Dai 32 Da, 6644 Dai 33 Da, 6852 
Da i 34 Da, 6897 Da i 34 Da, 6999 Da i 35 Da, 7575 Da i 38 Da, 7657 Da i 38 Da. 8076 Da i 40 
Da, 8215 Dai 41 Da, 8474 Dai 42 Da, 8574 Dai 43 Da, 8702 Da i 44 Da, 8780 Da i 44 Da, 8922 
Dai 45 Da, 9078 Da i 45 Da, 9143 Da i 46 Da, 9201 Da i 46 Da, 9359 Dai 47 Da, 9425 Da i 47 
Da, 9581 Da i 48 Da, 9641 Dai 48 Da, 9718 Da i 49 Da, 9930 Da i 50 Da, 10215 Da i 51 Da, 
10369 Da i 52 Da, 10440 Da i 52 Da. 10594 Da ± 53 Da, 1 1216 Da i 56 Da, 1 1464 Da i 57 Da, 
11547 Da i 58 Da, 11693 Da i 58 Da, 11905 Da i 60 Da, 12470 Da i 62 Da, 12619 Da i 63 Da, 
12828 Da i 64 Da, 13290 Da i 66 Da, 13632 Da i 68 Da, 13784 Da i 69' Da, 13983 Da i 70 Da, 
14798 Da i 74 Da, 15005 Da i 75 Da. 15140 Da i 76 Da, 15350 Da i 77 Da, 15879 Da i 79 Da. 
15957 Da i 80 Da, 16104 Da i 81 Da, 16164 Da ± 81 Da, 16953 Da i 85 Da, 17263 Da i 86 Da, 
17397 Da i 87 Da, 17617 Da i 88 Da, 17766 Da i 89 Da, 17890 Da i 89 Da, 181 15 Da i 91 Da, 
18390 Da i 92 Da, 22338 Da i 112 Da, 22466 Da i 1 12 Da. 22676 Da i 1 13 Da, 2295 J Da i 1 15 Da. 
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24079 Da ± 120 Da, 28055 Da db 140 Da, or 28259 Da ± 141 Da is detected by dUuting the biological 
sample 1:5 in a denaturation buffer consisting of 7 M urea, 2 M thiourea, 4% CHAPS, 1% DTT, and 
2% Ampholine, and then 1 : 10 in binding buflfer consisting of 0.1 M Tris-HCl, 0.02% Triton X-100 at 
pH 8.5 at 0 to 4®C, applying thus treated sample to a biologically active surface comprising positively 
5 charged (cationic) quaternary ammonium groups (anion exchanging), incubating for 120 minute at 20 
to 24^C, and subjecting the bound biomolecules to gas phase ion spectrometry as described in another 
section* 

Although said biomolecules were first identified in blood serum samples, their detection is not limited 
10 to said sample type. The biomolecules may also be detected in other samples types, such as blood, 
blood serum, plasma, nipple aspirate, urine, semen, seminal fluid, seminal plasma, prostatic fluid, 
excreta, tears, saliva, sweat, biopsy, ascites, cerebrospinal fluid, milk, lymph, or tissue extract. 
Preferably, samples are of blood, blood serum, plasma, urine, excreta, prostatic fluid, biopsy, ascites, 
lymph or tissue extract origin. More preferred are blood, blood serum, plasma, urine, excreta, biopsy, 
IS lymph or tissue extract samples. Even more preferred are blood serxmi, urine, excreta or biopsy 
samples. Overall preferred are blood serum samples. 

Since the biomolecules can be sufBciently characterized by their mass and biochemical characteristics 
such as the type of biologically active surfece they bind to or the pH of binding conditions, it is not 

20 necessary to identify the biomolecules in order to be able to identU^ them in a sample. It should be 
noted that molecular mass and binding properties are characteristic properties of these biomolecules 
and not limitations on the means of detection or isolation. Furthermore, using the methods described 
herein, or other methods known in the art, the absolute identity of the markers can be determined. This 
is important when one wishes to develop and/or screen for specific binding molecules, or to develop a 

25 an assay for the detection of said biomolecules using ^ecific binding molecules. 

d'^ Biologically Active Surfaces 

In one embodiment of the invention, biologically active surfaces include, but are not restricted to, 
sur&ces that contain adsorbents such as quaternary ammonium groups (anion exchange sur&ces), 

30 carboxylate groups (cation exchange surfaces), alkyl. or aryl chains (hydrophobic interaction, reverse 
phase chemistry), groups such as nitriloacetic acid that immobilize metal ions such as nickel, gallium, 
copp^, or zinc (metal, affinity interaction), or biomolecules such as proteins, preferably antibodies, or 
nucleic acids, preferably protein binding sequences, covalently bound to the sur&ce via caibonyl 
diimidazole moieties or epoxy groups (specific afBnity interaction). Preferred are adsorbents 

35 comprising anion exchange sur&ces. 

These surfaces may be located on matrices like polysaccharides such as sepharose, e.g. anion 
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exchange suifeces or hydrophobic interaction surfedes, or soUd metals, e.g. antibodies coupled to 
magnetic beads. Surfaces may also include gold-plated surfaces such as those used for Biacore Sensor 
Chip technology. Other sur&ces known to those skilled in the art ate also included within the scope of 
the invention. 

Biologically active surfeces are able to adsorb biomolecules like amino acids, sugars, fatty acids, 
stooids. nucleic acids, polynucleotides, polypeptides, carbohydrates, Upids. and combinations thereof 
(e.g., glycoproteins, ribonucleoproteins, lipoproteins). 

In another embodiment, devices that use biologically active surfeces to selectively adsorb 
biomolecules be chromatography columns for Fast Protein Liquid Chromatography (FPLC) and 
ffigh Pressure Uquid Chromatography (HPLC), where the matrix, e.g. a polysaccharide, carrying the 
biologically active sur&ce, is filled into vessels (usually refened to as "columns^ made of glass, steel, 
or synthetic materials like polyetherethertetone (PEEK). 

In yet anoOira- embodiment, devices that use biologically active sur&ces to sdectively adsorb 
biomolecules may be metal strips carrying fliin layers of the biologically active suifece on one or more 
^ots of tile strip surfece to be used as probes far gas phase km qiectirometry analysis, for example the 
SAX2 ProtemChip array (Cipher^n Biosystems, Inc.) for SELDI analysis. 



el Mass Pmfilinff 

In one embodiment, tiie mass profile of a sample may be generated using an anay-based assay in 
which tiie biomolecules of a given sample are bound by biochemical or aflSnity intraactions to an 
adsorbent present on a biologicaUy active surface located on a soUd platform Carray" or "probe"). 

25 After tiie biomolecules have bound to tiie adsorbent, they are detected using gas phase ion 
spectrometry. Biomolecules or otiier substances bound to the adsorbents on tiie probes can be analyzed 
using a gas phase ion spectrometer. This includes, e.g., mass spectrometers, ion mobility 
spectrometers, or total ion current measuring devices. The quantity and characteristics of the 
biomolecule can be detennmed using gas phase ion spectrometry. Other substances in addition to tiie 

30 biomolecule of interest can also be detected by gas phase ion spectrometry. 

In one embodiment, a mass spectrometer can be used to detect biomolecules on the probe. In a typical 
mass spectrometer, a probe witii a biomolecule is intioduced into an mlet system of tiie mass 
spectrometer. The biomolecule is tiien ionized by an ionization source, such as a laser, fest atom 
35 bombardment, or plasma. The generated ions are collected by an ion optic assembly, and ttien a mass 
analyzer disperses and analyzes tiie passing ions. Witiiin tiie scope of tiiis invention, tiie ionisation 
course tiiat ionises the biomolecule is a laser. 
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The ions exiting the mass analyzer are detected by a ion detector. The ion detector then translates 
infonnation of the detected ions into mass-to-charge ratios. Detection of the presence of a biomolecule 
or other substances will typicaUy involve detection of signal intensity. This, in turn, can reflect the 
quantity and character of a biomolecule boimd to the probe. 

In another embodiment, the mass profile of a sample may be generated using a Uquid-chiomatography 
(LC)-based assay m which the biomolecules of a given sample are bound by biochemical or afBnity • 
interactions to an adsorbent located in a vessel made of glass, steel, or synthetic material; known to 
those skilled in the art as a chromatography column. The biomolecules are eluted fipom the biologicaUy 
active surface by washing the vessel with appropriate sotations known to those skilled in the art. Such 
solutions include but are. not Ihnited to, buflFere, e.g. Tris (hyd«>xymethyl) aminomethane 
hydrochloride (TRIS-HCl), buffers contaming salt, e.g. sodium chloride (NaCl), or organic solvents, 
e.g. acetonitrile. Biomolecule mass profiles are generated by appUcation of the elutmg biomolecules of 
the sample by direct connection via an dectrospiay device to a mass spectrometer (LC/ESI-MS). 

Conditions that promote binding of biomolecules to an adsorbent are known to those skiUed in the art 
(reference) and ordinarily include parameters such as pH. the concentration of salt, organic solvent, or 
oth€x competitors for bindmg of the biomolecule to the adsorijent. Within the scope of the invention, 
incubation temperatures are of at least 0 to 100»C, preferably of at least 4 to 60»C, and most preferably 
of at least 15 to SO^C. Varying additional parameters, such as incubation time, the concentration of 
detergent, e.g., 3-[(3-Cholamidopropyl) dimethylammomo]-2-hydroxy-l-propanesulfonate (CHAPS), 
or reducing agents, e.g. dithiothreitol (DTT), are also known to those skiUed in the art. Various 
degrees of bmding can be accomplished by combining the above stated conditions as needed, and will 
be readily apparent to those skilled in the art. 

ft Methods for detecting biomolecules with in a sample 

In yet another aspect, the invention relates to methods for detecting di£FerentiaUy present biomolecules 
in a test sample and/or biological sample. Within the context of the invention, any suitable method can 
be used to detect one or more of the biomolecules described herein. For example, gas phase ion 
spectrometry can be used. This technique includes, e.g., laser desoiption/ionization mass spectrometry. 
Preferably, the test and/or biological sample is prepared prior to gas phase ion spectrometry; e.g., 
pre-fiactionation, two-dimensional gel chroiiaatography, high performance liquid chromatography, etc. 
to assist detection of said biomolecules. Detection of said biomolecules can also be achieved using 
methods other'than gas phase ion spectrometry. For example, immunoassays can be used to detect the 
biomolecules within a sample. 
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In one embodiment, the test and/or biological sample is prepared prior to contacting a biologic^y 
active surface and is in aqueous fonn. Examples said samples include, but are not limited to, blood, 
blood serum, plasma, nipple aspirate, urine, semen, seminal fluid, seminal plasma, prostatic fluid, 
tears, saliva, sweat, ascites, cerebrospinal fluid, milk, lymph, or tissue extract samples. Furthermore, 
- 5 solid test and/or biological samples, such as excreta or biopsy samples can be solubilised in or 
admixed with an ebxcat using methods known to those skilled in the art such Aat said samples may be 
easily applied to a biologically active surface. Test and/or biological samples in the aqueous form can 
be further prepared using specific solutions for denatunition (pie-treatment) like sodium dodecyl 
sul&te, mercsptoethanol, urea, etc. For example, a test and/or biological sample of the invention can 
10 be denatured prior to contacting a biologically active sur&ce comprising of quaternary ammonium 
groups by diluting said sample 1:5 with a buffer consistixig of 7 M urea, 2 M thiourea, 4% CHAPS, 
1% DTT and 2% ampholine. 

The sample is contacted with a biologically active sur&ce using any techniques including bathing, 
IS soaking, dipping, spraying, washing over, or pipetting, etc. Generally, a volume of sample containing 
from a few atomoles to 100 picomoles of a biomolecule in about 1 to 500 \il is sufficient for detecting 
binding of the biomolecule to the adsorbent. 

The pH value of tiie solvent in which the sample contacts the biologically active sur&ce is a function 
20 of the specific sample and the selected biologically active surface. Typically, a sample is contacted 
with a biologically active surface under pH values between 0 and 14, preferably between about 4 and 
10, more preferably between 4.5 and 9.0, and most preferably, at pH 8.5. The pH value depends on the 
type of adsorbent present on a biologically active surface and can be adjusted accordingly. 

25 The sample can contact the adsorbent present on a biologically active for a period of time sufficient to 
allow the marker to bind to the adsorbent. Typically, the sample and the biologically active surface are 
contacted for a period of between about 1 second and about 12 hours, preferably, between about 30 
seconds and about 3 hours, and most preferably for 120 minutes. 

30 The temperature at which the sample contacts the biologically active surface (incubation temperature) 
is a function of the^specific sample and the selected biologically active surface. Typically, the washing 
solution c& be at a temperature of between 0 and lOO^C, preferably betwem 4 and 37^C, and most 
preferably between 20 and 24''C. 

3S For example, a biologically active 'mufece comprising of quaternary ammonium groups (anion 
exchange surfece) will bind the biomolecules described herein when the pH value is between 6.5 and 
9.0. Optimal buiding of the biomolecules'of the present invention occurs at a pH of 8.5. Furthermore, a 
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sample is contacted with said biologically active surface for 120 min. at a temperature of 20 - 24 **C. 

Following contacting a sample or sample solution with a biological surface, it is preferred to remove 
any xmboimd biomolecules so that only the bound biomolecules remain on the biologically active 
surfece. Washing unbound biomolecules are removed by methods known to those skilled in the art 
such as bathing, soaking, dippmg, rinsing, spraying, or washing the biologically active surfece with an 
eluent or a washing solution. A microfluidics process is preferably used when a washing solution such 
as an eluent is introduced to small spots of adsorbents on the biologically active surfece. Typically, the 
washing solution can be at a temperature of between 0 and 100**C, preferably between 4 and 37°C, and 
most preferably between 20 and 24°C. 

Washing solution or eluents used to wash the unbound biomolecules fix)m a biologically active surfece 
mclude, but are not limited to, organic solutions, aqueous solutions such as buffers wherein a buffer 
may contain detergents, salts, or reducing agents in appropriate concentrations as those known to those 
skilled in the art. 

Aqueous solutions are preferred for washing biologically active surfeces. Exemplary aqueous 
solutions mclude, but not limited to, HEPES buffer, Tris buffer, phosphate bufiBered saline (PBS), and 
modifications thereof. The selection of a particular washing solution or an eluent is dependent on other 
experimental conditions (e. g,, types of adsorbents used or biomolecules to be detected), and can be 
determined by those of skill in the ait. For example, if a biologically active surfece comprising a 
quaternary ammonium group as adsorbent (anion exchange surfece) is used, then an aqueous solution, 
such as a Tris buffer, may be preferred. In another example, if a biologically active surfece comprising 
a carboxylate group as adsoibent (cation exchange surfece) is used, then an aqueous solution, such as 
an acetate buffer, may be preferred. 

Optionally, an energy absorbing molecule (EAM), e.g. in solution, can be applied to biomolecules or 
other substances bound on the biologically active s\irface by spraying, pipetting or dipping. Applying 
an EAM can be done after unbound materials are washed off of the biologically active surface. 
Exemplary energy absorbing molecules include, but are not limited to, cinnamic acid derivatives, 
sinapinic acid and dihydroxybenzoic acid. 

Once tlie biologically active surface is bee of any unbound biomolecules, adsorbent-bound 
biomolecules are detected using gas phase, ion specto>metry. The quantity and charactertetics of a 
biomolecule can be determined using said method. Furthermore, said biomolecules can be analyzed 
using a gas phase ion spectrometer such as mass spectrometers, ion mobiUty spectrometers, or total 
ion current measuring devices. Other gas phase ion spectrometers known to those skilled in the art are 
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also included. 



In one embodiment, mass spectrometry can be used to detect biomolecules of a given sample present 
on a biologically active surface. Such methods include, but are not limited to, matrix-assisted laser 
desorption ionizationAime-of-ffight (MALDI-TOF), surface-enhanced laser desoiption 
ionization/time-of-flight (SELDI-TOF), Uquid chromatography coupled with MS, MS-MS, or 
ESI-MS. Typically, biomolecules are analysed by introducmg a biologically active surface containing 
said biomolecules, ionizing said biomolecules to generate ions that are collected and analysed. 

In a preferred embodiment, the biomolecules present in a sample are detected using gas phase ion 
spectrometry, and more preferably, using mass spectrometry. In one embodiment, matrix-assisted laser 
desorption/ionization ('TVIALDI") mass spectrometry can be used. In MALDI, the sample is typically 
quasi-purified to obtain a fiaction that essentially consists of a msak&c using separation methods such 
as two-dimensional gel electrophoresis or high performance liquid chromatography (HPLC). 



In another embodiment, surfece-enhanced laser desorptionAonization mass spectrometry ("SEUJr*) 
can be used. SELDI uses a substrate comprising adsorbents to capture biomolecules, which can then 
be directly desorbed and ionized from the substrate surface during mass spectrometry. Since the 
substrate surface in SELDI captures biomolecules, a sample need not be quasi-purified as in MALDI. 
3 However, depending on the complexity of a sample and the type of adsorbents used, it may be 
desirable to prepare a sample to reduce its complexity prior to SELDI analysis. 

For example, biomolecules bound to a biologically active surface can be introduced mto an inlet 
system of the mass spectrometer. The biomolecules are then ionized by an ionization source such as a 
25 laser, fest atom bombardment, or plasma. The generated ions are then collected by an ion optic 
assembly, and then a mass analyzer disperses the passing ions. The ions exiting the mass analyzer are 
detected by a detector and translated into mass-to-charge ratios. Detection of the presence of a 
biomolecule typically involves detection of its specific signal intensity, and reflects the quantity and 
character of said biomolecule. 

30 . ' " 

In a preferred embodiment, a laser desorption time-of-fUght mass spectrometer is used with the probe 
of the present invention. In laser desorption mass spectrometry, biomolecules bound to a biologically 
active surfece are introduced into an inlet system. Biomolecules are desorbed and ionized into the gas 
phase by a laser. The ions generated are then coUected by an ion optic assembly. Th^e ions are 

35 accelerated through a short high voltage field and let chift into a high vacuum chamber of a time-of- 
flight mass analyzer. At the fer end of the high vacuum chamber, the accelerated ions strike a sensitive 
detector surfece at a different time. Since the time-of-flight is a function of the mass of tiie ions, die 
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elapsed time between ionization and impact can be used to identify the presence or absence of 
molecules of a specific mass. 



The detection of biomolecules described herein can be enhanced using certain selectivity conditions 
5 (e. g., types of adsorbents used or washing solutions). In a preferred embodiment, the same or 
substantially the same selectivity conditions that were used to discover the biomolecules can be used 
in the methods for detecting a biomolecule in a sample. 

Combinations of the laser desorption time-of-flight mass spectrometer with other components 
10 described herein, in the assembly of mass spectrometer that employs various means of desorption, 
acceleration, detection, measurement of time, etc, are known to those skilled in.the art. 

Data generated by desorption and detection of markers can be analyzed with the use of a 
programmable digital computer. The computer program generally contains a readable medium that 

15 stores codes. Certain codes can be devoted to memory that include the location of each feature on a 
biologically active surface, the identity of the adsorbent at that feature and the elution conditions used 
to wash the adspibent. Using this information, the program can thra idmtify the set of features on the 
biologically active sur&ce defining certain selectivity characteristics (e. g. types of adsorbent and 
eluents used). The computer also contains codes that receive as data (input) on the strength of the 

20 signal at various molecular masses received from a particular addressable location on the biologically 
active sur&ce. This data can indicate the number of biomolecules detected, as well as the strength of 
the signal and the determined molecular mass for each biomolecule detected. 

Data analysis can include the steps of det^mining signal strength (e. g., hei^t of peaks) of a 
25 biomolecule detected and removing "outliers" (data deviating from a predetermined statistical 
distribution). For example, the observed peaks can be nonnalized, a process whereby the height of 
each peak relative to some reference is calculated. For example, a reference can be background noise 
generated by instrument and chemicals (e. g., energy absorbiug molecule), which is set as zero in the 
scale. Then the signal strength detected for each biomolecule can be displayed in the form of relative 
30 ' intensities in the scale desired (e. g., 100). Alternatively, a standard may be admitted with the sample 
so that a peak from the standard can be used as a reference to calculate relative intensities of the 
signals observed for each biomolecule or other biomolecules detected. 

The computer can transform the resulting data into various fonnats for displaying. la one format, 
35 referred to as "spectrum view*', a standard spectral view can be di^layed, wherein the view depicts 
the quantity of a biomolecule reaching the detector at each particular molecular mass. In another 
format, referred to as "scatter plof ' only the peak height and mass information 4re retained from the 
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spectrum view, yielding a cleaner image and enabling biomoiecules with nearly identical molecular 
mass to be more visible. 

Using any of the above display formats, it can be readily detennined from the signal display whether a 
S biomoiecule having a particular molecular mass is detected fiom a sample. Preferred biomoiecules of 
the invention are biomoiecules with an apparent molecular mass of about 2020 Da ± 10 Da, 2049 Da ± 
10 Da, 2270 Da ± 11 Da, 2508 Da ± 13 Da, 2732 Dai 14 Da, 3026 Da ±^ IS Da, 3227 Da ± 17 Da, 
3326 Da ± 17 Da, 3456 Da ± 17 Da, 3946 Da db 20 Da, 4103 Da db 21 Da, 4242 Da ± 21 Da, 4295 Da ± 
2 1 Da, 4359 Da ± 22 Da, 4476 Da ± 22 Da, 4546 Da i 23 Da, 4607 Da ± 23 Da, 4719 Da ± 24 Da, 

10 4830 Dai 24 Da, 4865 Dai 24 Da, 4963 Da i 25 Da, 5112 Dai 26 Da, 5226 Dai 26 Da, 5493 Da i 
27 Da, 5648 Da i 28 Da, 5772 Da i 29 Da, 5854 Da i 29 Da, 6446 Da i 32 Da, 6644 Da i 33 Da, 
6852 Da i 34 Da, 6897 Da i 34 Da, 6999 Da i 35 Da, 7575 Da i 38 Da, 7657 Da i 38 Da, 8076 Da i 
40 Da, 8215 Da i 41 Da, 8474 Da i 42 Da, 8574 Da i 43 Da. 8702 Da i 44 Da, 878O Da i 44 Da, 
8922 Dai 45 Da, 9078 Da i 45 Da, 9143 Dai 46 Da, 9201 Dai 46 Da, 9359 Dai 47 Da. 9425 Da i 

15 47 Da. 9581 Da i 48 Da. 9641 Da i 48 Da. 9718 Da i 49 Da, 9930 Dai 50 Da, 10215 Da i 51 Da, 
10369 Dai 52 Da, 10440 Da i 52 Da, 10594 Da i 53 Da, 11216 Da i 56 Da, 11464 Da i 57 Da, 
11547 Da i 58 Da, 11693 Da i 58 Da, 11905 Da i 60 Da, 12470 Da i 62 Da, 12619 Da i 63 Da, 
12828 Da i 64 Da, 13290 Da i 66 Da, 13632 Da i 68 Da, 13784 Da i 69 Da, 13983 Da i 70 Da, 
14798 Da i 74 Da, 15005 Da i 75 Da, 15140 Da i 76 Da, 15350 Da i 77 Da, 15879 Da i 79 Da, 

20 15957 Da i 80 Da. 16104 Da i 81 Da, 16164 Da i 81 Da, 16953 Da i 85 Da, 17263 Da i 86 Da, 
17397 Da i 87 Da, 17617 Da i 88 Da, 17766 Da i 89 Da, 17890 Da i 89 Da, 18115 Da ± 91 Da, 
1 8390 Da i 92 Da, 22338 Da i 112 Da, 22466 Da i 112 Da, 22676 Da i 1 13 Da, 2295 1 Da i 1 15 Da, 
24079 Da i 120 Da, 28055 Da i 140 Da, or 28259 Da i 141 Da. Moreover, from the strength of 
signal, the amount of a biomoiecule bound on the biologically active sui&ce can be detennined. 

25 

g") Identification of proteins 

In case the biomoiecules of the invention are proteins, the present invention comprises a method for 
the identification of these proteins, especially by obtaining their amino acid sequence. This method 
comprises the purification of said proteins fix)m the complex biological sample (blood, blood serum, 
30 plasma, nipple aspirate, urine, semen, seminal fluid, seminal plasma, prostatic fluid, tears, saliva, 
sweat, ascites, cerebrospinal fluid, milk, lymph, or tissue extract samples) by fractionating said sample 
using techniques known by the one of ordinary skill m the art, most preferably protein 
chromatography (FPLC, HPLC). 

•35 The biomoiecules of the invention include those proteins with a molecular mass selected fi-om 2020 
Da± 10 Da, 2049 Dai IQ Da, 2270 Da ± 11 Da, 2508 Dai 13 Da, 2732 Dai 14 Da, 3026 Da± 15 
Da, 3227 Da ± 17 Da, 3326 Da ± 17 Da, 3456 Da ± 17 Da, 3946 Da ± 20 Da, 4103 Da db 21 Da, 4242 
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Da ± 21 Da. 4295 Da ± 21 Da, 4359 Da ± 22 Da, 4476 Da ± 22 Da, 4546 Da ± 23 Da, 4607 Da ± 23 
Da, 47 19 Da ± 24 Da, 4830 Da ± 24 Da, 4865 Da ± 24 Da, 4963 Da ± 25 Da, 5 1 12 Da ± 26 Da, 5226 
Da ± 26 Da, 5493 Da ± 27 Da, 5648 Da ± 28 Da, 5772 Da ± 29 Da. 5854 Da ± 29 Da, 6446 Da ± 32 
Da, 6644 Da ± 33 Da. 6852 Da ± 34 Da, 6897 Da ± 34 Da, 6999 Da ± 35 Da, 7575 Da ± 38 Da. 7657 
5 Da ± 38 Da. 8076 Da ± 40 Da, 8215 Da ± 41 Da. 8474 Da ± 42 Da. 8574 Da ± 43 Da. 8702 Da ± 44 
Da, 8780 Da ± 44 Da. 8922 Da ± 45 Da, 9078 Da ± 45 Da. 9143 Da ± 46 Da, 9201 Da ± 46 Da, 9359 
Da ± 47 Da, 9425 Da ± 47 Da, 958 1 Da ± 48 Da, 964 1 Da ± 48 Da, 9718 Da ± 49 Da, 9930 Da ± 50 
Da, 10215 Da ± 51 Da, 10369 Da ± 52 Da, 10440 Da ± 52 Da, 10594 Da ± 53 Da, 1 1216 Da ± 56 Da, 

11464 Da ± 57 Da, 11547 Da ± 58 Da, 11693 Da ± 58 Da, 11905 Da ± 60 Da, 12470 Da ± 62 Da, 
10 12619 Da ± 63 Da, 12828 Da ± 64 Da, 13290 Da ± 66 Da, 13632 Da ± 68 Da, 13784 Da ± 69 Da, 

13983 Da ± 70 Da, 14798 Da ± 74 Da, 15005 Da ± 75 Da, 15140 Da ± 76 Da, 15350 Da ± 77 Da, 

15879 Da ± 79 Da. 15957 Da ± 80 Da, 16104 Da ± 81 Da, 16164 Da ± 81 Da. 16953 Da ± 85 Da. 

17263 Da ± 86 Da. 17397 Da ± 87 Da. 17617 Da ± 88 Da. 17766 Da ± 89 Da. 17890 Da ± 89 Da. 

18115 Da± 91 Da, 18390 Dai 92 Da. 22338 Da± 112 Da, 22466 Dai 112 Da. 22676 Da± 113 Da, 
15 2295 1 Da ± 1 1 5 Da, 24079 Da ± 120 Da, 28055 Da ± 140 Da, and 28259 Da ± 141 Da, 

Furtiiieiinore, the method conqjrises the analysis of 4e fractions for flie presoice and purity of said 
proteins by the method which was used to identify them as differentially expressed biomolecules, for 
example two-dimensional gel electrophoresis or SELDI mass spectrometry, but most prefaably 
20 SELDI mass spectrometry. The method also comprises an analysis of the purified proteins aiming 
towards the revealing of their amino acid sequence. This analysis may be performed using techniques 
in mass spectroscopy known to those skilled in the art. 

In one embodiment, this analysis may be prarformed using peptide mass fingerprinting, revealing 
25 information about 4e specific peptide mass profile after proteolytic digestion of the investigated 
protein. 

In another anbodiment, this analysis may be preferably performed using post-source-decay (PSD), or 
MSMS, but most preferably MSMS, revealing mass information about all possible fiagments of the 
30 investigated protein or proteolytic peptides thereof leadmg to the smino acid sequence of the 
investigated protein of proteolytic peptide thereof. 

The informatioii revealed by the aforementioned techniques can be used to feed world-wide-web 
search engines, such as MS Fit (Protein Prospector, bttD://orospector.ucsf:eduV for information 
35 obtained fibm peptide mass fingerprinting, or MS Tag (Protein Prospector, http://prospector.ucsf.edu) 
for information obtained from PSD, or mascot (www.matrixscience.com) for information obtained 
from MSMS and peptide mass fingerprinting, for tiie aUgnment of tiie obtained results with data 
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available in public protein sequence databases, such as SwissPiot (http://us.expasy.ais/sprot/)> NCBI 
(http://www.ncbi.n]mjiih.^v/BLAST/), EMBL (ht^://srs.einbI-hddelberg.de:80O0/scsS/) which leads 
to a confident infonnation about the identity of said i»oteins. 

5 This information may comprise, if available, tiie complete amino acid sequence, the calculated 
molecular mass, the structure, the enzymatic activity, the physiological function, and gene e3q>ression 
of the investigated proteins. 

h'^Kits 

10 In y^ another aspect, the invention provides kits using die methods of the invention as described in the 
section Diagnostics for the di£ferential diagnosis of colorectal cancer or a nan-malignant disease of the 
large intestine, wherein the kits are used to detect the biomolecules of the present invention. 

The biomolecules of the invention include those proteins with a molecular mass selected fix>m 2020 

IS Da± 10 Da, 2049 Da± 10 Da, 2270 Da ± 11 Da, 2508 Da ± 13 Da, 2732 Da db L4 Da, 3026 Daib IS 
Da, 3227 Da ± 17 Da, 3326 Da ± 17 Da, 3456 Da ± 17 Da,' 3946 Da ± 20 Da, 4103 Da ± 21 Da, 4242 
Da ± 21 Da. 4295 Da ± 21 Da. 4359 Da ± 22 Da. 4476 Da db 22 Da, 4546 Da ± 23 Da. 4607 Da ± 23 
Da, 4719 Da ± 24 Da, 4830 Da ± 24 Da, 4865 Da ± 24 Da, 4963 Da ± 25 Da, 5112 Da i: 26 Da, 5226 
Da±'26 Da, 5493 Da± 27 Da, 5648 Da± 28 Da, 5772 Da± 29 Da, 5854 Dai: 29 Da, 6446 Dadb 32 

20 Da, 6644 Da ± 33 Da, 6852 Da ± 34 Da. 6897 Da ± 34 Da. 6999 Da ± 35 Da, 7575 Da ± 38 Da, 7657 
Da ± 38 Da, 8076 Da :i: 40 Da, 8215 Da ± 41 Da, 8474 Da ± 42 Da, 8574 Da ± 43 Da, 8702 Da ± 44 
Da, 8780 Da ± 44 Da, 8922 Da ± 45 Da, 9078 Da ± 45 Da, 9143 Da ± 46 Da, 9201 Da ± 46 Da. 9359 
Da 47 Da, 9425 Da ± 47 Da, 9581 Da ± 48 Da, 9641 Da ± 48 Da, 9718 Da ± 49 Da. 9930 Da ± 50 
Da. 10215 Dad= 51 Da, 10369 Da Ik 52 Da, 10440 Da ±52 Da, 10594 Da ± 53 Da, 11216Da±S6Da, 

2S 11464 Da ±57 Da, 11547 Dadb 58 Da. 11693 Da ±58 Da. 11905 Da ±60 Da, 12470 Da ±62 Da, 
12619 Da ± 63 Da. 12828 Da ± 64 Da, 13290 Da ± 66 Da. 13632 Da ± 68 Da. 13784 Da ± 69 Da. 
13983 Da ± 70 Da, 14798 Da ± 74 Da, 15005 Da ± 75 Da, 15140 Da ± 76 Da, 15350 Da ± 77 Da, 
15879 Da ± 79 Da, 15957 Da ± 80 Da, 16104 Da ± 81 Da, 16164 Da ± 81 Da, 16953 Da ± 85 Da, 
17263 Da ± 86 Da, 17397 Da ± 87 Da, 17617 Da ± 88 Da, 17766 Da ± 89 Da, 17890 Da ± 89 Da, 

30 18115 Da±91 Da, 18390 Dai 92 Da, 22338 Da ± 112 Da, 22466 Dai 1 12 Da, 22676 Da ± 113 Da, 
2295 1 Da i 1 15 Da, 24079 Da ± 120 Da, 28055 Da ± 140 Da. or 28259 Da ± 141 Da. 

*For example, the kits can be used to detect one or more of differentially preset biomolecules as 
described above in a test sample of subject. The kits of the. invention have many applications. For 
35 example, the kits can be used to differentiate if a subject is healthy, having' a precanc^ous lesion of 
the large intestine, a colorectal cancer, a metastasized colorectal cancer or a non-malignant disease of 
the large intestine. Thus aiding the diagnosis of colorectal cancer or a non-malignant disease of the 
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large intestine. In another example, the kits can be used to identify compounds that modulate 
expression of said biomolecules. 

In one embodiment, a kit comprises an adsorbent on a biologically active surface, wherein the 
5 adsorbent is suitable for binding one or more bioniolecules of the invention, a denaturation solution for 
the pre-treatment of a 'sample, a binding solution, a washing solution or instructions for making a 
denaturation solution, binding solution, or washing solution, wherein the combination allows for the 
detection of a biomolecule using gas phase ion spectrometry. Such kits can be prepared fiom the 
materials described in other previously detafled sections (e. g., denaturation buffer, binding buffer, 
10 adsorbents, washing solutions, etc.)- 

In some embodiments, the kit may comprise a first substrate comprising an.adsorbent thereon (e. g., a 
particle fimctionalized with an adsorbent) and a second substrate onto which the first substrate can be 
positioned to form a probe, which is removably insertable into a gas phase ion spectrometer. In othor 
IS embodiments, the kit may comprise a single substrate, which is in the form of a removably insertable 
probe with adsorbents on the substrate. 

In another embodiment, a kit comprises a binding molecule that specifically binds to a biomolecule 
related to the invention, a detection reagent, £q>propriate solutions and instructions on how to use the 

20 kit. Such kits can be prepared from the materials described above, and other materials known to those 
skilled in the art. A binding molecule used within such a kit may include, but is not limited to, 
proteins, peptides, nucleotides, nucleic acids, hormones, amioo acids, sugars, &tty acids, steroids, 
polynucleotides, carbohydrates, lipids,, or a combination thereof (e.g. glycoproteins, 
ribonucleqproteins, lipoproteins), compoimds or synthetic molecules. Preferably, a binding molecule 

25 used in said kit is an antibody. 

In either embodiment, the kit may optionally further comprise a standard or control information so that 
the test sample can be compared with the control information standard to determine if the test amount 
of a marker detected in a sample is a diagnostic amount consistent with a diagnosis of colorectal 
30 cancer. 

The present invention also relates to use 2020 Da ± 10 Da, 2049 Da ± 10 Da, 2270 Da ± 1 1 Da, 2508 
■ • Da i 13 Da, 2732 Da ± 14 Da, 3026 Da ± 15 Da, 3227 Da ± 17 Da, 3326 Da ± 17 Da, 3456 Da ± 17 
Da, 3946 Da ± 20 Da, 4103 Da db 21 Da, 4242 Da ± 21 Da, 429? Da ± 21 Da, 4359 Da i 22 Da, 4476 
35 Da ± 22 Da, 4546 Da ± 23 Da, 4607 Da ± 23 Da, 4719 Da ± 24 Da, 4830 Da ± 24 Da, 4865 Da ± 24 
Da, 4963 Da ± 25 Da, 5 1 12 Da ± 26 Da, 5226 Da ± 26 Da, 5493 Da ± 27 Da, 5648 Da ± 28 Da, 5772 
Da ± 29 Da, 5854 Da ± 29 Da, 6446 Da db 32 Da, 6644 Da ± 33 Da, 6852 Da ± 34 Da, 6^897 Da ± 34 
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Da, 6999 Da ± 35 Da, 7575 Dai 38 Da, 7657 Da ± 38 Da, 8076 Da ± 40 Da, 8215 Da ± 41 Da, 8474 
Da ± 42 Da, 8574 Da ± 43 Da, 8702 Da ± 44 Da; 8780 Da ± 44 Da, 8922 Da ± 45 Da, 9078 Da ± 45 
Da, 9143 Da ± 46 Da, 9201 Da± 46 Da, 9359 Da ± 47 Da, 9425 Da ± 47 Da, 9581 Da ± 48 Da, 9641 
Da±48 Da, 9718Da±49 Da, 9930 Da.±50 Da, 10215 Da±51 Da. 10369 Da±52 Da, 10440 Dai 
5 52 Da, 10594 Da ±53 Da, 11216 Dai 56 Da, 1 1464 Da i 57 Da, 11547 Dai 58 Da, 11693 Dai 58 
Da, 11905 Dai 60 Da, 12470 Dai 62 Da, 12619 Da i 63 Da, 12828 Da i 64 Da, 13290 Da i 66 Da, 
13632 Da ± 68 Da, 13784 Da i 69 Da, 13983 Da i 70 Da, 14798 Da i 74 Da, 15005 Da i 75 Da, 
15140 Da i 76 Da, 15350 Da i 77 Da, 15879 Da i 79 Da, 15957 Da i 80 Da. 16104 Da i 81 Da, 
16164 Da i 81 Da. 16953 Da i 85 Da, 17263 Da i 86 Da. 17397 Da i 87 Da. 17617 Da i 88 Da. 
10 17766 Da i 89 Da, 17890 Da i 89 Da, 181 15 Da i 91 Da, 18390 Da i 92 Da, 22338 Da i 1 12 Da, 
22466 Dai 112 Da, 22676 Dai 113 Da, 22951 Dai 115 Da, 24079 Da i 120 Da, 28055 Dai 140 
Da, or 28259 Da i 141 Da for manufectore of an agei^ for diagnosis, i»ophyIactic and/or thenq)eutic 
treatment of non-steroid dependent cancer, preferably colorectal cancer. 

15 The invention also relates to a method for aiding non-steroid dependost cancer diagnosis especially 
colorectal cancer, the method ccxuprising (a) detecting at least one prat^ marker in a sample, 
wdberein the protein marker is selected from 2020 Da i 10 Da, 2049 Da i 10 Da, 2270 Da i'l 1 Da, 
2508 Da i 13 Da, 2732 Da i 14 Da, 3026 Da i 15 Da, 3227 Da i 17 Da, 3326 Da i 17 Da. 3456 Da i 
17 Da, 3946 Da i 20 Da, 4103 Da i 21 Da, 4242 Da i 21 Da, 4295 Da i 21 Da, 4359 Da i 22 Da, 

20 4476 Da i 22 Da, 4546 Da i 23 Da, 4607 Da i 23 Da, 4719 Da i 24 Da, 4830 Da i 24 Da, 4865 Da i 
24 Da, 4963 Da ± 25 Da, 5 1 12 Da i 26 Da, 5226 Da i 26 Da, 5493 Da i 27 Da, 5648 Da i 28 Da, 
5772 Da i 29 Da, 5854 Da i 29 Da, 6446 Da i 32 Da, 6644 Dai 33 Da, 6852 Da i 34 Da, 6897 Da i 
34 Da, 6999 Da i 35 Da, 7575 Da i 38 Da, 7657 Da ± 38 Da, 8076 Da i 40 Da, 8215 Da i 41 Da, . 
8474 Da ± 42 Da, 8574 Da i 43 Da, 8702 Da i 44 Da. 8780 Da i 44 Da, 8922 Da i 45 Da, 9078 Da i 

25 45 Da, 9143 Da ± 46 Da, 9201 Da i 46 Da, 9359 Da i 47 Da, 9425 Da i 47 Da, 9581 Da i 48 Da, 
9641 Da i 48 Da, 9718 Da i 49 Da, 9930 Da i 50 Da, 10215 Da i 51 Da, 10369 Da i 52 Da, 10440 
Da i 52 Da, 10594 Da i 53 Da, 1 1216 Da i 56 Da, 1 1464 Da ± 57 Da, 1 1547 Da ± 58 Da, 1 1693 Da 
i 58 Da, 1 1905 Da i 60 Da, 12470 Da i 62 Da, 12619 Dai 63 Da, 12828 Da i 64 Da, 13290 Da i 66 
Da, 13632 Da i 68 Da, 13784 Da i 69 Da, 13983 Da i 70 Da, 14798 Da i 74 Da, 15005 Da i 75 Da, 

30 15140 Da i 76 Da, 15350 Da i 77 Da, 15879 Da i 79 Da, 15957 Da i 80 Da, 16104 Da i 81 Da, 
. 16164 Da i 81 Da, 16953 Da i 85 Da, 17263 Da i 86 Da, 17397 Da i 87 Da, 17617 Da i 88 Da, 
17766 Da i 89 Da, 17890 Da i 89 Da, 181 15 Da i 91 Da, 18390 Da i 92 Da, 22338 Da i 1 12 Da, 
22466 Da i 1 12 Da. 22676 Da i 1 13 Da, 22951 Da i 1 15 Da, 24079 Da i 120 Da, 28055 Da i 140 
Da, or 28259 Da i 141 Da and (b) correlating the detection of the pr protein maricer with a probable 

35 diagnosiis of non-steroid cancer' especially colorectal ciancer. 

The present invention is fortfaer illiistrated by the following examples, which should not be construed 
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as limiting in any way. The contents of all cited references (including literature references, issued 
patents, published patent applications), as cited throughout this application, are hereby expressly 
incorporated by reference. The practice of the present invention will employ, unless otherwise 
indicated, conventional techniques of cell biology, cell culture, molecular biology, transgenic biology, 
microbiology, recombinant DNA, and immimology, whicU are known to those skilled in the art. Such 
techniques are explained fully in the literature. 

Example 1« Sample collection for colon cancer evaluation. 

Serum samples were obtained jfrom a total of 15 1 individuals, which included two different groups of 
subjects. In the first group (group I), sera were drawn from 57 colon cancer patirats, undergoing 
diagnosis and treatment of colon cancer at the Departments of Gastroenterology and Surgery of the 
Universities of Magdeburg, Erlangen, and Cottbus (all Germany). Serum samples were collected from 
the patients directly before surgery. At this time, a primary diagnosis was made based on endoscopy, 
ultrasonic testing, and/or other means for the detection of colorectal cancer. In all cases the diagnosis 
was conjBrmed by histological evaluation after surgery. Follow-up data for all colon cancer patients 
are cuirenfly collected and will be available for later studies. 

The non-cancer control group (group H) consisted of 94 subjects with non-malignant disease 
symptoms of the large intestine (adenoma, inflammation, diverticulosis), which were recruited from 
the University Hospitals in Magdeburg. Cottbus, and Erlangen. Serum from each subject was taken 
following colorectal endoscopy, wherein the absence of colorectal cancer was confirmed. 
Fuitiiemiore, all subjects denied a personal history of cancer and were otherwise healthy. Follow-up 
data for all non-cancer controls are currently collected and will be available for later studies. In 
addition, 77 serum samples from healthy blood donors was also collected for test-set analysis. Blood 
donors are considered to be healthy individuals not suffering from severe diseases. 

Example 2. ProteinChip Array analysis. 

ProteinChip Arrays of the SAX2-type (strong anion exchanger) were arranged mto a bioprocessor 
(Ciphergen Biosystems, Inc.), a device that contains up to 12 ProteinChips and fecilitates processing 
of the ProteinChips. The ProteinChips were pre-incubated in the bioprocessor with 200 binding 
buffer (0.1 M Tris-HCl, 0.02% Triton X-100, pH 8.5). 10 \il of serum sample was diluted 1:5 in a 
buffer (7 M urea^ 2 M thiourea, 4% CHAPS, 1% DTT, 2% ampholine) and again diluted 1:10 in the 
binding buffer. Then, 300 \il of this mixture (eqtiivaleht to 6 Vl original serum sample) were directly 
applied onto the spots of the SAX2 ProteinChips. In between dilution steps and prior to the application 
to the spots, the sample was kept oil ice (at 0°C). After incubation for 120 minutes at 20 to 24 ""C the 
chips were incubated with 200 \il binding buffer, before 2 x 0.5 jil EAM solution (20 mg/ml sinapinic 
acid in 50% acetonitrile and 0.5% trifluoroacetic acid) was applied to the spots. After air-drying for 10 
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min, the ProteinChips were placed in the ProteinChip Reader (ProteinChip Biology System H, 
Ciphergen Biosystems, Inc.) and time-of-flight spectra were generated by laser shots coUected in the 
positive mode at laser intensity 215, with the detector sensitivity of 8, Sixty laser shots per average 
spectra were perfomied. 

5 

Calibration of mass accuracy was performed by using the following mixture of mass standard calibrant 
proteins: Dynoiphin A Oporcine, 209 - 225, 2147.50 Da), Beta-endorphin (hiunan, 61 - 91, 3465.00 
Da), Insulin (bovine, 5733.58 Da), and Cytochrome c (bovine, 12230.90 Da) at a concentration of L21 
pmo]/)il, and Myoglobin (equine cardiac, 16951.50 Da) at a concentration of 5.16 pmol/^l. 0.5^1 of 
10 this mixture was s^plied to a single spot of a H4 ProteinChip array. After air-drying of the drop, 2x1 
^1 matrix solution (a saturated solution of sinapinic add in 50% acetonitrile 0.5% trifluoiacetic acid) 
was applied to the spot. The drop was allowed to air-diy for 10 min after each application of matrix 
soiudoiL 

15 The ProteinChip was placed in the ProteinChip Reader (Biology System n, Ciphergen Biosystems, 
Inc.) and time-of-flig}it spectra were generated by laser shots collected in the positive mode at laser 
intensity 210, with the detector sensitivity of 8. Sixty laser shots per average spectra were performed. 
Subsequently, Time-Of-Flight values were correlated to the molecular masses of the standard proteins, 
and calibration was performed according to fixe instrument manuaL 

20 

Example 3. Peak detectton and data analysis. 

The analysis of the data was perfonned by automatic peak detection and aligmnent u^bog the operating 
software of the ProteinChip Biology System II, the ProteinChip Software Vmion 3.1 (Ciphergen 
Biosystems, Lie). Figure 1 shows a comparison of protein mass spectra d^ected using the above 
25 mentioned SAX2 ProteinChip arrays for samples isolated from patients suffering from non-malignant 
diseases of the large intestine (e.g., acute or chronic inflammation, adenoma) (CI and C2) and of 
patients with colon cancer (Tl and T2). 

The complete set of patients was randomly divided into a training set and a test set. The train set 
30 comprised of 54 randomly selected patients with colon cancer and 75 randomly selected patients 
without colon cancer. The test set comprised of 14 randomly selected patients with colon cancer and 
19 randomly selected patients without colon cancer. Additionally, a test set comprising of 77 sera 
obtained from healthy blood donors was compiled. This was done in order to test the classification 
algorithm generated on the basis of the spectra of the subgroup of healthy individuals (see below) * 

35 

The m/z values of all mass spectra selected for the analysis ranged between 2000 Da and 30000 Da. 
wherein smaller masses were not used since artefects with the "Energy Absorbing Molecule, EAM*' 
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('*Matiix'0 could not be excluded, and higher masses were not detected under the chosen experimental 
conditions. The spectra within the train set were normalised according to the intensity of the total ion 
current, followed by baseline subtraction, and automatic peak detection as previously described by 
Adam et al. (ref.), using the "Biomarker Wizard" tool of the ProteinChip Software Version 3.1 
5 (Ciphergen Biosystem, Inc.). The following settings were chosen for peak detection by "Biomarker 
Wizard": a) auto-detect peaks to cltister, b) first pass: 5 signal/noise, c) minimum peak threshold: 5% 
of all spectra, d) deletion of user-detected peaks below threshold, e) cluster mass window: +/- 0.3% of 
mass. Using these settings, 90 signal clusters were identified. 

10 The normalization coefficient generated by normalizing the spectra of the train sets and the cluster 
infonnation of the train sets generated by the "Biomarker Wizard*' tool of the software werfe saved and 
used to externally nonnalize the spectra of the corresponding test sets and to cluster the signals of the 
corresponding test sets according to the normalization and peak identification of the train.sets. 

IS The cluster infonnation for each train and test set (containing sample ID and sample group, cluster 
mass values and cluster signal intensities for each spectrum within the sets) was transformed into an 
interchangeable data format (a .csv table) using the "Sample group statistics**, function of the 
"Biomarker Wizard" tool of the ProteinChip Software Version 3.1. In fliis format, the data can be 
analysed by a specific software for the generation of regression and classification trees (see examples 

20 5 to 7). 

Example 4. Construction of classifiers. 

Four classifiers with binary target variable (cancer versus non-cancer) were constructed: First, as a 
proof of principle, a classifier was constructed only on the basis of the training set described above. 

25 Second, a firtal classifier was constructed on the basis of all available mass peaks and all colon cancer 
samples, fusing the corresponding training and test data sets. Third, a 2*^ final colon classifier was 
constructed analogously to the first final colon cancer classifier but excluding the most informative 
and dominating mass of the first final colon classifier. Fourth, a 3"* final colon classifier was 
constmcted analogoxisly to the first final colon cancer classifier but excluding the most informative 

30 and dominating masses of the first and 2"^ final colon classifier. 

Forward variable selection was applied in order to determine highly informative sets of variables 
(**pattenis**) for classification. The results of the present invention were generated using the "CART** 
decision tree approach (classification and regression trees; Breiman et al,, 1984). Moreover, bagging 
35 of classifiers was applied to overcome typical instabilities of forward variable sjelection procedures, 
thereby increasing overall classifier performance (Breiman, 1994). 

42 



More precisely, for the training set 50 bootstrap samples were generated (sampling with replacement, 
maximal 3 sample redraws). For each bootstrap sample an exploratory decision tree was generated. 
Nodes were split using the Gini rule until all final nodes were either pure, i.e., contained only samples 
of one class, or until one of the following stopping rules was met: no nodes comprising less than 4 
5 cases were split and no splits were considered resulting in a node comprising only one sample. The 
such obtained SO single classifiers, one for each bootstrap sample, were combined to constitute an 
ensemble of classifiers predicting class membership by plurality yote. 

The procedure of classifier construction was conducted four times to obtain one proof-of-principle 
10 classifier and three final classifiers for colon cancer detection. 

Example 5. Oassifier structure. 

The proof-of-principle classifier employed 71 masses (variables) out of 90 determined signal clusters. 
Sin^e decision trees consisted of 4 to 9 variables (S to 10 end nodes), 6 variables being typical, see 
IS histogram of Figure 4. Variable importance was roughly deduced by overall improvement, i.e., for 
each mass we summed the improv^ent values achieved in the generation of all 50 decision trees of 
the decision tree ensemble. The masses used by the proof-of-principle classifier are listed in Table 1 
(starting wifh most important masses having high improvement). An overview of the distributioii of 
masses is given in Figure 5. 

20 

The 1^ final classifier for colon cancer employed 75 masses out of 90 determined signal clustera. 
Single decision trees consisted of more variables than in the proof-of-principle classifier 9 variables 
were typical, see histogram of Figure 6. Variable importance was roughly deduced by overall 
improvement. The masses used by the 1^* final classifier are listed in Table 2 (starting with most 
25 important masses, i.e. masses with highest improvement values). An overview of the distribution of 
masses of the 1^ final classifier is given in Figure 7. 

The 2"^ fiiud classifier for colon cancer employed 77 masses out of 90 determined signal clusters. 
Single decision trees consisted of even more variables than in 1^ final classifier 10 variables were 
30 typical, see histogram of Figure 8.. Variable importance was roughly deduced by overall improvement. 
The masses used by the 2^^ final classifi^ are listed in Table 3 (starting with most important masses, 
i.e. masses with highest improvement values). An. overview of the distribution of masses of the 2°** 
final classifier is givra in Figure 9. 

35 The 3"* final classifier for colon cancer employed 80 masses out of 90 determined signal clusters. 
Single decision trees consisted of even more variables than in 1^ final classifier. 10 variables were 
typical, see histogram of Figure 10. Variable importance was roughly deduced by overall 
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improvement. The masses used by the 3 final classifier are listed in Table 4 (starting with most 
important masses, i.e, masses with highest improvement values). An overview of the distribution of 
masses of the 3"* final classifier is given in Figure 1 L 

With the exception of mass 10722 Da, the classifiers include all of the differentially expressed 
biomolecules found in this study. 

Example 6. Classification performance. 

Classification performance is determined for the proof-of-principle classifier on the colon cancer 
versus endoscopy control test data set as well as on a separate test set consisting of presumably healthy 
blood donors. The classifier ichieved 93% sensitivity and 84% specificity on the cancer versus 
endoscopy controls test data set and 94% specificity on 77 samples of blood donors. 

For the three final classifiers, we determined their specificity on 77 samples of blood donors. We 
obtained 92% specificity for the 1^ final classifier, 100% specificity for the 2"^ final classifier, and 
92% specificity for the 3"* final classifier. 
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Table 1: Ranking of masses of proof-of-principle classifier by overall improvement. 



mass 


improvement 


mass 


improvement 


mass 


improvement 


5493 


11.397 


6447 


0.193 


11465 


0.048 


4964 


0.915 


15879 


0,193 


8703 


0.046 


004D 




4719 


0.188 


13290 


0.045 


12ol9 


0.589 


3228 


0.176 


4607 


0.041 


8781 


0.511 


17263 


0.17 


3457 


0.04 


3941 


0.483 


15005 


0.159 


8215 


0.039 


* 7576 


0.464 


17617 


0.157 


3027 


0.038 


10595 


0.446 


2509 


0.155 


9360 


0.038 


22952 


0.442 


9078 


0.153 


5113 


0.031 


6852 


0.415 


4104 


0.132 


4295 


0.03 


3327 


0.409 


13633 


0.127 


17890 


0.028 


22467 


0.405 


7000 


0.122 


11694 


0.027 


24080 


0.398 


2733 


0.105 


11905 


0.026 


2021 


0.359 


9202 


0.095 


4546 


0.025 


12829 


0.346 


16105 


0.086 


16164 


0.025 


8575 


0.342 


18116 






U.U14 


2270 


0.323 


9718 


0.08 


22339 


0.013 


9143 


0.267 


4242 


0.069 


15957 


0.012 


4866 


0.229 


6898 


0.067 


4830 


0.011 


4359 


0.225 


4476 


0.066 


5854 


0.011 


2049 


0.223 


8922 


0.066 


5773 


0.009 


8077 


0.214 


7658 


0.062 






13784 


0.202 


8474 


0.058 






22677 


0.202 


12470 


0.058 






17397 


0.198 


S648 


0.052 







45 



Table 2: Rankmg of masses off 1** final classifier by overaU improvemeni 



mass 



improvement 



5493 

6645 

4964 

8781 

12829 

15879 

2021 

22952 

2270 

28055 

18116 

8077 

6852 

2049 

4359 

8575 

24080 

12619 

7576 

12470 

4104 

15957 

17263 

5854 

3327 



12.849 

1J216 

0.907 

0.559 

0.494 

0.392 

0.363 

0.353 

0.323 

0.305 

0.3 

0.298 

0.268 

0.252 

0.239 

0.233 

0.232 

0.197 

0.179 

.0.168 

0.166 

0.165 

0.165 

0.161 

0.161 



mass 



improvement 



17890 

10595 

7658 

11216 

2509 

3228 

16105 

22467 

9360 

4476 

4830 

9143 

10369 

17767 

4242 

6447 

22339 

15005 

4719 

7000 

5113 

9202 

4866 

16164 

3027 



0.157 

0.156 

0.148 

0.147 

0.141 

0.141 

0.128 

0.112 

0.111 

0.099 

0.093 

0.088 

0.088 

0.085 

0.083 

0.078 

0.078 

0.075 

0.073 

0.064 

0.062 

0.062 

0.058 

0.058 

0.057 



DAass 



improvament 



3947 

2733 

9581 

28259 

4607 

4546 

9930 

17617 

3457 

22677 

13633 

11694 

11905 

8703 

11465 

13983 

9078 

14798 

16953 

13290 

11547 

5648 

5226 

6898 

5773 



0.056 

0.051 

0.046 

0.045 

0.044 

0.042 

0.039 

0.039 

0.038 

0.036 

0.033 

0.032 

0.031 

0.028 

0.024 

0.024 

0.022 

0.022 

0.021 

0.021 

0.02 

0.011 

0.01 

0.01 

0.009 
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Table 3: Ranking of masses of 2°" final classifier by overall improvement. 



mass 


improvement 


mass improvement 


mass 


improvement 


3947 


5.672 


9360 


0.187 


8575 


0.068 


12829 


2.203 


3027 


0.179 


10369 


0.066 


6645 


1.472 


4866 


0.169 


17767 


0.063 


4964 


1.441 


12470 


0.163 


15350 


0.056 


8077 


1.158 


9078 


0.148 


11216 


0.046 


28055 . 


1.072 


2509 


0.147 


17890 


0.044 


15957 


0.912 


6898 


0.142 


8703 


0.039 


6852 


0.811 


10595 


0.139 


4295 


0.036 


12619 


0.539 


7576 


0.135 


15005 


0.036 


24080 


0.393 


8781 


0.116 


22677 


0.036 


3327 


0.385 


22339 


0.115 


9581 


0.031 


28259 


0-34 


5854 


0.114 


9426 


0.03 


2021 


0.337 


2270 


0.11 


13290 


0.027 


16105 


0.316 


6447 


0,106 


15879 


0.026 


11694 


0.315 


22952 


0.104 


17397 


0.023 


4104 


0.299 


4242 


0.092 


5648 


0.022 


2049 




10215 


0.092 


i/olv 


0.022 


4719 


0.27 


5113 


0.09 


8474 


0.019 


16164 


0.25 


9202 


0.089 


10440 


0.016 


3457 


0.241 


9143 


0.086 


4359 


0.009 


4546 


0.238 


13983 


0.082 


5226 


0.008 


17263 


0.232 


4830 


0.081 


7000 


0.006 


16953 


0.228 


4476 


0.08 


7658 


0.006 


2733 


0.225 


11465 


0.072 






22467 


0.218 


18116 


0.071 






5773 


0.193 


15140 


0.07 






3228 


0.19 


4607 


0.068 
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Table 4: Ranking of masses of 3"* final classiiSer by overaU improvement. 



mass 


8 wM vAnv 




improvement 


4964 


3.431 


10595 


0.187 


12829 


2.166 


7658 


0.183. 


6645 


1.999 


9078 


0.183 


28055 


1.288 


8781 


0.171 


28259 


1.152 


5773 


0.144 


6852 


1.089 


2270 


0.134 




0.781 


5113 


0.133 


16105 


0.737 


7576 


0.132 


1605'^ 


0.736 


9143 


0.131 




0.714 


6447 


0.128 




0.705 


2733 


0.111 




0.666 


18116 


0.109 




0.615 


4607 


0.104 


ASd6 


0.485 


11694 


0.104 




0.403 


15879 


0.1 


4242 


0.329 


9202 


0.099 


4719 


0.304 


10215 


0.092 


12470 


0.292 


4476 


0.089 


9360 


0.283 


9581 


0.089 


3457 


0.279 


11905 


0.086 


22952 


0.275 


4359 


0.079 


2509 


0.261 


4295 


0.075 


4104 


0.245 


4866 


0.068 


2049 


0.23 


9718 


0.068 


24080 


0.219 


11465 


0.062 


16164 


0.201 


13983 


0.062 


3228 


0.198 


22339 


0.056 


5854 


0.192 


3027 


0.047 



mass improvement 



15140 

7000 

22467 

10369 

18390 

13290 

6898 

17767 

8703 

13633 

15005 

15350 

13784 

17617 

14798 

17397 

5226 

9426 

5648 

8474 

8575 

10440 

17263 

11216 



0.047 

0.046 

0.044 

0.042 

0.042 

0.041 

0.038 

0.038 

0.036 

0.036 

0.036 

0.032 

0.031 

0.029 

0.027 

0.026 

0.026 

0.026 

0.022 

0.019 

0.019 

0.016 

0.009 

0.008 
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We claim: 



EPO-BERLIN 
2 4 -11- 2003 



1 . A method for the differential diagnosis of a colorectal cancer and/or a non-malignant disease 
of the large intestine, in vitro, comprising: 

S a) obtaining a test sample from a subject, 

b) contacting test sample with a biologically active surface under specific binding 
conditions 

c) allowing the biomolecules within the test sample to bind said biologically active 
surface, 

10 d) detecting bound biomolecules using a detection method, wherem the detection method 

generates a mass profile of said test sample, 

e) transfomiing the mass profile into a compute: readable form, and 

f) comparing the mass profile of e) with a database containing mass profiles specific for 
healthy subjects, subjects having a precancerous lesion of the large intestine subjects 

1^ having colorectal cancer, subjects having metastasised colorectal cancer, or subjects 

having a non-malignant disease of the large intestine, 
wherein said comparison allows for the differential diagnosis of a subject as healthy, 
having a precancerous lesion of the large intestine, having a colorectal cancer, having a 
metastasised colorectal cancer and/or a non-malignant disease of the large intestine. 

20 

2. The method of claim 1 , wherein the database is generated by 

a) obtauung biological samples from healthy subjects, subjects having a precancerous 
lesion of the large intestine, subjects having colorectal cancer, subjects having 
metastasised colorectal cancer, and subjects having a non-malignant disease of the 

25 ' large intestine, 

b) contacting said biological samples with a biologically active surface under specific 
binding conditions, 

c) allowing the biomolecules within the biological samples to bind to said biologically 
active surfoce, 

2® d) detecting bound biomolecules using a detection method, wherein the detection method 

generates mass profiles of said biological samples, 

e) transforming the mass profiles into a computer-readable form, 

f) applymg a mathematical algorithm to classify the mass profiles in e) as specific for 
healthy subjects, subjects having a precancerous lesion of the large intestine, subjects 
havmg colorectal cancer, subjects having metastasised colorectal cancer, and subjects 
having a non-malignant disease of the large intestine. 
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The method of clahn 1, wherein Ihe biomolecules are characterized by. 

a) diluting a sample 1:5 in a denaturation buffer consisting of 7 M urea, 2 M thiourea, 
4% CHAPS, 1% DTT, 2% Ampholine, at 0" to 4* 

b) further diluting said sample 1:10 with a binding buffer consisting of 0.1 M Tris-HCl, 
0.02% Triton X-100, pH 8.5 at 0" to 4» 

c) contacting the sample with a biologically active surfece comprising positively charged 
quaternary ammonium groups 

d) incubating of the treated sample with said biologically active surfece fi>r 120 minutes 
under temperatures between 20 and 24"'C at pH 8.5, 

e) and analysing the bound biMnolecules by gas phase ion spectrometry. 

The method of claim 1, wherein the detection method is mass spectrometry. 

The method of claim 4, wherein the method of mass spectrometry is selected from the group 
of matrix-assisted laser desorption ionizationAime of flight (MALDI-TOF), surfece enhanced 
taser desorption ionisation/time of flight (SELDtTOF), Uquid chromatography, MS-MS 
and/or ESI-MS. 

The method of clahns 1, wherem the biologically active surfece comprises an adsorbent 
selected from the group of quaternary ammonhan groups, carboxylate groups, groups with 
alkyl or aryl chains, groups such as nitiiloacetic acid that unmobilize metal ions, or proteins, 
antibodies, or nucleic acids. 

The method of claim 1, wherein the mass profiles comprise a panel of one or more 
differentially expressed biomolecules. 

The method of claim 7, wherein, wherein the biomolecules are selected from a group having 
the apparent molecular mass of 2020 Da ± 10 Da, 2049 Da ± 10 Da, 2270 Da d= 11 Da, 2508 
Da ± 13 Da. 2732 Da ± 14 Da. 3026 Da ± 15 Da, 3227 Da ± 17 Da, 3326 Da ± 17 Da, 3456 
Da ± 17 Da. 3946 Da ± 20 Da, 4103 Da ± 21 Da, 4242 Da ± 21 Da, 4295 Da ± 21 Da. 4359 
Da db 22 Da. 4476 Da ± 22 Da, 4546 Da ± 23 Da, 4607 Da ± 23 Da, 4719 Da ± 24 Da. 4830 
Da ± 24 Da. 4865 Da ± 24 Da, 4963 Da ± 25 Da, 51 12 Da ± 26 Da, 5226 Da ± 26 Da. 5493 
Da ± 27 Da, 5648 Da ± 28 Da, 5772 Da ± 29 Da, 5854 Da ± 29 Da. 6446 Da ± 32 Da, 6644 
Da ± 33 Da. 6852 Da ± 34 Da, 6897 Da ± 34 Da. 6999 Da ± 35 Da, 7575 Da ± 38 Da. 7657 
Da ± 38 Da. 8076 Da ± 40 Da. 8215 Da ± 41 Da, 8474 Da ± 42 Da, 8574.Da ± 43 Da, 8702 
- Da ± 44 Da. 8780 Da ± 44 Da, 8922 Da ± 45 Da, 9078 Da ± 45 Da, 9143 Da ± 46 Da, 9201 
Da ± 46 Da, 9359 Da ± 47 Da, 9425 Da ± 47 Da, 9581 Da ± 48 Da, 9641 Da ± 48 Da. 9718 
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Da db 49 Da, 9930 Da ± 50 Da, 10215 Da ± 51 Da, 10369 Da ± 52 Da, 10440 Da ± 52 Da, 
10594 Da ± 53 Da, 1 12 16 Da ± 56 Da, 1 1464 Da ± 57 Da, 1 1547 Da ± 58 Da, 1 1693 Da ± 58 
Da, 1 1905 Da ± 60 Da, 12470 Da ± 62 Da, 12619 Da ± 63 Da, 12828 Da ± 64 Da, 13290 Da ± 
66 Da, 13632 Da ± 68 Da, 13784 Da ± 69 Da, 13983 Da ± 70 Da, 14798 Da ± 74 Da, 15005 
Da ± 75 Da, 15140 Da ± 76 Da, 15350 Da ± 77 Da, 15879 Da ± 79 Da, 15957 Da ± 80 Da, 
16104Da±81Da,16164Dadb81Da, 16953 Da±85 Da, 17263 Da i 86 Da, 17397 Da±87 
Da, 17617Da±88Da, 17766 Da±89 Da, 17890 Da±89 Da, 18115 Da±91 Da, 18390 Dai 
92 Da, 22338 Da ± 112 Da, 22466 Da db 112 Da, 22676 Da± 113 Da, 22951 Dai 115 Da, 
24079 Da ± 120 Da, 28055 Da =b 140 Da and/or 28259 Da ± 141 Da. 

9. A method for the identification of difiermtially expressed biomolecules wherein the 
biomoleciiles of any of claims 1-8 are proteins, comprising: 

a) cbromatogFaphy and firactionation, 

b) analysis of firactions for the presence of said differentially expressed proteins and/or 
fragments thereof, using a biologically active surface, 

c) further analysis using mass spectrometry to obtain amino acid sequences encoding 
said proteins and/or fragments thereof, and 

d) searching amino acid sequence databases of known proteins to identify said 
differentially expressed proteins by amino acid sequence comparison. 

10. The method of claim 9» wherein the method of chromatography is selected from high 
performance liquid chromatography {HPLC) or fast protein liquid chromatography (FPLC). 

IL The method of claim 9, wherein the mass spectrometry used is selected from the group of 
matrix-assisted laser desorption ionization/time of flight (MALDI-TOF), surface enhanced 
laser desoiption lonisatLon/time of fli^ .(SELDI-TOF), liquid chromatogr^hy, MS-MS 
and/or ESI-MS. 

12. A method for the differential diagnosis of a colorectal cancer and/or a non-malignant disease 
of the large intestine, in vitro, comprising detection of one or more differentially expressed 
biomolecules wherein the biomolecules are polypeptides, comprising: 

a) obtaining a test sample from a subject, 

b) contacting said sample with a binding molecule specific for a differentially expressed 
polypeptide identified in claims 9-11,- 

c) detecting the presence or absence of said polypeptide(s), 

wherem the presence or absence of said polypeptide(s) allows for the differential 
diagnosis of a subject as healthy, having a precancerous lesion of the large intestine. 
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having a colorectal cancer, having a metastasised colorectal cancer and/or a non-malignant 
disease of the large intestine. 



13. A kit for the diagnosis of a colorectal cancer or a non-malignant disease of the large intestine 
comprising the method of claim 1 and fiirther comprising a denaturation solution, a binding 
solution, a washing solution, a biologically active surfece comprising an adsorbent, and 
instructions to use the kit. 

14. A kit for the diagnosis of a colorectal cancer or a non-malignant disease of the large intestine 
comprising the method of claim 12, and further comprising a solution, binding molecule, 
detection substrate, and instructions to use the kit. 

1 5. The method of any one of claims 1-14» wherein the colorectal cancer is a cancer of the colon 
or rectum. 

16. The method of any one of claims 1-14, wherein the test sample is a blood, blood seram, 
plasma, niiq>le aspirate, urine, semen, seminal fluid, seminal plasma, prostatic fluid, excreta, 
tears, saliva, sweat, biopsy, ascites, cerebrospinal fluid, milk, lymph, or tissue extract sample. 

17. The method of any one of claims 1-14, wherein the biological sample is a blood, blood serum, 
plasma, nipple aspirate, urine, semen, seminal fluid, seminal plasma, prostatic fluid, excreta, 
tears, saliva, sweat, biopsy, ascites, cerebrospinal fluid, milk, lymph, or tissue extract sample. 

18. The method of any one of claims 1-14, wherein the subject is of ma mmalian origin. 

19. Tbe subject of claim 18, wherein the subject is of human origin. 
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ABSTRACT 



EPO-BERUN 
24 -A ^- 2003 



The present invention provides biomoiecules and the use of these biomolecules for the differential 
diagnosis of colorectal cancer or a non-malignant disease of the large intestine. In particular the 
present invention provides methods for detecting biomolecules within a test sample as well as a 
database comprising of mass profiles of biomolecules specific for healthy subjects, subjects having a 
precancerous lesion of the large intestine, subjects having a colorectal cancer or a metastasised 
colorectal cancer or subjects having a non-malignant disease of the large intestine. Fiirthennore, the 
present invention provides methods for the characterization of said biomolecules using gas phase ion 
spectrometry. In addition, the present invention provides methods for the identification of said 
biomolecules provided that they are proteins or polypeptides. The invention further provides kits for 
the differential diagnosis of colorectal cancer or a non^-malignant disease of the large intestine. 
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Figure 2C 
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Figure 2D 
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Figure 3 A 
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Figure 3E 
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Figure 3F 
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Figure 4 
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Figure 9 
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Figure 10 
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Figure 11 
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